inthefabric / Fabric

The collective mind awaits your input.
www.inthefabric.com
Other
5 stars 0 forks source link

Perform ElasticSearch operations via API functions #14

Closed zachkinstner closed 11 years ago

zachkinstner commented 11 years ago

Using Gremlin, ElasticSearch is currently only useful for searches based on an initial index. It does not seem possible to do a search within a subset of nodes, for example, within all the Artifacts created by a particular App. Relevant discussion is happening at https://github.com/thinkaurelius/titan/issues/272.

To perform a CONTAINS query:

g.query().has("Cl_Na", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "x").vertices()
zachkinstner commented 11 years ago

Consider whether the numeric indexing options for ElasticSearch are worthwhile for Fabric. They are currently enabled in the most recent commit.

They use the Query.Compare operators. These are basic options (equal/not, lessthan/equal, greaterthan/equal) and are probably not too useful (in most cases) for Fabric. Remember, they have to be used as the initial indexed lookup for the query -- they aren't used within a query result subset (see #14).

Probably the most useful case would be for timestamps.

In a Gremlin query, use the full path, like com.tinkerpop.blueprints.Query.Compare.EQUAL.

zachkinstner commented 11 years ago

To perform a numeric "Compare" query:

g.query().has("A_Cr", com.tinkerpop.blueprints.Query.Compare.GREATER_THAN, 635038034428189752).vertices()

Currently returns an error:

{"timer":278,"err":"com.tinkerpop.rexster.client.RexProException> An error occurred while processing the script for language [groovy]. All transactions across all graphs in the session have been concluded with failure: javax.script.ScriptException: org.apache.commons.lang.NotImplementedException: Code is not implemented"}

This is due to my changes from this commit, where I added two NotImplementedException cases for this new has method overload.

zachkinstner commented 11 years ago

Two types of indexed lookups for the Factor.Indentor.Value property:

g.V("F_IdV", "Melissa Kinstner")
g.query().has("F_IdV", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "mel").vertices()

Currently, the CONTAINS lookup isn't working. I think I'm building the index incorrectly. I forgot about a special case for creating both a Titan and an ElasticSearch index. From this Titan documentation:

When using Titan’s standard index, the name argument to the indexed() method is optional. An equivalent definition of the name property key, which identifies the standard index by its name, would be:

graph.makeType().name("name").dataType(String.class).indexed("standard",Vertex.class).unique(Direction.BOTH).makePropertyKey();

The name “standard” is always reserved for Titan’s standard index backend.

zachkinstner commented 11 years ago

Interesting... perhaps I have found an indexing bug. The first search below returns results. The second does not:

g.query().has("F_IdV", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "Melissa").vertices()
g.query().has("F_IdV", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "melissa").vertices()

There seems to be case-sensitivity happening here, which I thought was not supposed to occur with ElasticSearch lookups. Perhaps the dual-indexing is a factor here.

zachkinstner commented 11 years ago

The dual-indexing theory is out. The following returns no results, either. Case-sensitivity is still occurring:

g.query().has("Cl_Na", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "X").vertices()
zachkinstner commented 11 years ago

From the TitanServer output:

13/05/18 21:32:33 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[216172782113783874]CONTAINSx)]. For better performance, use indexes
13/05/18 21:32:52 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[216172782113783874]CONTAINSX)]. For better performance, use indexes
zachkinstner commented 11 years ago

New test, starting with an empty graph:

g.makeType().name("test").dataType(String.class).indexed("search",Vertex.class).unique(Direction.OUT).makePropertyKey();
g.addVertex([test: "Elastic"]);
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "Elastic").vertices() //works
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "elastic").vertices() //works
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "E").vertices() //no result
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "Elasti").vertices() //no result

No warnings in the TitanServer output. Why aren't the partial-text searches working?

zachkinstner commented 11 years ago

Started a discussion thread about this.

zachkinstner commented 11 years ago

Regarding the comment about numeric indexes, switching the order does not generate the same error:

g.query().has("A_Cr", 1234, com.tinkerpop.blueprints.Query.Compare.GREATER_THAN).vertices()
zachkinstner commented 11 years ago

This latest commit fixes the numeric "compare" issue noted above.

zachkinstner commented 11 years ago

The partial-text search works (case-sensitive) if the property is not indexed:

g.makeType().dataType(String.class).name("test").unique(OUT).makePropertyKey();
g.addVertex([test: "Elastic"]);
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "E").vertices(); //works
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "e").vertices(); //no results

This creates the following Titan warnings:

13/05/19 13:54:57 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[36028797018963978]CONTAINSE)]. For better performance, use indexes
13/05/19 13:57:08 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[36028797018963978]CONTAINSe)]. For better performance, use indexes
zachkinstner commented 11 years ago

After #16 is complete, add more Root traversal functions based on non-ID graph indexes. Most notably, the ElasticSearch indexes on names, such as GetClassesByName.

zachkinstner commented 11 years ago

The notes above (which also became a forum discussion) resulted in issue thinkaurelius/titan#296.

zachkinstner commented 11 years ago

Aside from that issue, I think this is all handled now. See #23 for further details.