Closed zachkinstner closed 11 years ago
Consider whether the numeric indexing options for ElasticSearch are worthwhile for Fabric. They are currently enabled in the most recent commit.
They use the Query.Compare operators. These are basic options (equal/not, lessthan/equal, greaterthan/equal) and are probably not too useful (in most cases) for Fabric. Remember, they have to be used as the initial indexed lookup for the query -- they aren't used within a query result subset (see #14).
Probably the most useful case would be for timestamps.
In a Gremlin query, use the full path, like com.tinkerpop.blueprints.Query.Compare.EQUAL
.
To perform a numeric "Compare" query:
g.query().has("A_Cr", com.tinkerpop.blueprints.Query.Compare.GREATER_THAN, 635038034428189752).vertices()
Currently returns an error:
{"timer":278,"err":"com.tinkerpop.rexster.client.RexProException> An error occurred while processing the script for language [groovy]. All transactions across all graphs in the session have been concluded with failure: javax.script.ScriptException: org.apache.commons.lang.NotImplementedException: Code is not implemented"}
This is due to my changes from this commit, where I added two NotImplementedException
cases for this new has
method overload.
Two types of indexed lookups for the Factor.Indentor.Value
property:
g.V("F_IdV", "Melissa Kinstner")
g.query().has("F_IdV", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "mel").vertices()
Currently, the CONTAINS
lookup isn't working. I think I'm building the index incorrectly. I forgot about a special case for creating both a Titan and an ElasticSearch index. From this Titan documentation:
When using Titan’s standard index, the name argument to the indexed() method is optional. An equivalent definition of the name property key, which identifies the standard index by its name, would be:
graph.makeType().name("name").dataType(String.class).indexed("standard",Vertex.class).unique(Direction.BOTH).makePropertyKey();
The name “standard” is always reserved for Titan’s standard index backend.
Interesting... perhaps I have found an indexing bug. The first search below returns results. The second does not:
g.query().has("F_IdV", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "Melissa").vertices()
g.query().has("F_IdV", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "melissa").vertices()
There seems to be case-sensitivity happening here, which I thought was not supposed to occur with ElasticSearch lookups. Perhaps the dual-indexing is a factor here.
The dual-indexing theory is out. The following returns no results, either. Case-sensitivity is still occurring:
g.query().has("Cl_Na", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "X").vertices()
From the TitanServer output:
13/05/18 21:32:33 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[216172782113783874]CONTAINSx)]. For better performance, use indexes
13/05/18 21:32:52 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[216172782113783874]CONTAINSX)]. For better performance, use indexes
New test, starting with an empty graph:
g.makeType().name("test").dataType(String.class).indexed("search",Vertex.class).unique(Direction.OUT).makePropertyKey();
g.addVertex([test: "Elastic"]);
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "Elastic").vertices() //works
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "elastic").vertices() //works
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "E").vertices() //no result
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "Elasti").vertices() //no result
No warnings in the TitanServer output. Why aren't the partial-text searches working?
Started a discussion thread about this.
Regarding the comment about numeric indexes, switching the order does not generate the same error:
g.query().has("A_Cr", 1234, com.tinkerpop.blueprints.Query.Compare.GREATER_THAN).vertices()
This latest commit fixes the numeric "compare" issue noted above.
The partial-text search works (case-sensitive) if the property is not indexed:
g.makeType().dataType(String.class).name("test").unique(OUT).makePropertyKey();
g.addVertex([test: "Elastic"]);
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "E").vertices(); //works
g.query().has("test", com.thinkaurelius.titan.core.attribute.Text.CONTAINS, "e").vertices(); //no results
This creates the following Titan warnings:
13/05/19 13:54:57 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[36028797018963978]CONTAINSE)]. For better performance, use indexes
13/05/19 13:57:08 WARN transaction.StandardTitanTx: Query requires iterating over all vertices [(v[36028797018963978]CONTAINSe)]. For better performance, use indexes
After #16 is complete, add more Root traversal functions based on non-ID graph indexes. Most notably, the ElasticSearch indexes on names, such as GetClassesByName
.
The notes above (which also became a forum discussion) resulted in issue thinkaurelius/titan#296.
Aside from that issue, I think this is all handled now. See #23 for further details.
Using Gremlin, ElasticSearch is currently only useful for searches based on an initial index. It does not seem possible to do a search within a subset of nodes, for example, within all the Artifacts created by a particular App. Relevant discussion is happening at https://github.com/thinkaurelius/titan/issues/272.
To perform a CONTAINS query: