mediachain / L-SPACE

[DEPRECATED] Books = Knowledge = Power = (Mass x Distance^2) / Time^3
MIT License
9 stars 1 forks source link

Fulltext index (centralized) #50

Closed parkan closed 8 years ago

parkan commented 8 years ago

OrientDB appears to only have one sane fulltext engine, Lucene. Using elastic is not really supported.

Indexing

With SQL, we can create the index like this

create index ImageBlob.title_description on ImageBlob (title,description) FULLTEXT ENGINE LUCENE

or with the Java API

OSchema schema = databaseDocumentTx.getMetadata().getSchema();
OClass oClass = schema.createClass("ImageBlob");
//... create fields
oClass.createIndex("ImageBlob.title_description", "FULLTEXT", null, null, "LUCENE", new String[] { "title"}, new String[] { "description"});

(these both use the standard analyzer, which is sufficient for now)

Querying (SQL)

The query with SQL would then be

select * from V where title LUCENE "test*"
select * from ImageBlob where [title,description] LUCENE "(title:foo AND description:bar)"

As far as I can tell, this allows querying individual fields within a multi-field index (first example) or any subset/all. Querying a field like this when it is not indexed will simply not return results, so querying against all vertices is valid and will only return relevant results (from classes that have had that field indexed)

Querying (Gremlin)

It's not clear what, if any, gremlin-scala support for this. It's possible with the normal gremlin (TP2) client:

https://github.com/orientechnologies/orientdb/issues/5021

It seems like index support is generally a bit weak in orientdb-gremlin:

However if there are multiple indexes on the same property, or if there the traversal should better use a composite index, that's not handled well yet. If you feel inclined you can add these cases to the OrientGraphIndexTest.java.

We can try throwing tests in here: https://github.com/mpollmeier/orientdb-gremlin/blob/0883aa92b81db5762db0e8cade7fa81e2b7f5c32/driver/src/test/java/org/apache/tinkerpop/gremlin/orientdb/OrientGraphIndexTest.java

Alternately, we can suck it up and use the SQL queries directly.

CF: https://github.com/orientechnologies/orientdb-lucene/wiki/Full-Text-Index

parkan commented 8 years ago

This is still valid and will act as a useful crutch until decentralized index design is complete