thinkaurelius / titan

Distributed Graph Database
http://titandb.io
Apache License 2.0
5.25k stars 1.01k forks source link

Ordered Traversal on Titan Using Elasticsearch #1299

Open fppt opened 8 years ago

fppt commented 8 years ago

When using Titan 1.0.0 and Elasticsearch as my indexing backend I create the following Mixed index:

TitanGraph titanGraph = TitanFactory.open("titan-cassandra-es.properties");
TitanManagement management = graph.openManagement();

PropertyKey typeKey = management.makePropertyKey("TYPE").dataType(String.class).make();
PropertyKey degreeKey = management.makePropertyKey("DEGREE").dataType(Long.class).make();

management.buildIndex("byTypeDegree", Vertex.class)
    .addKey(typeKey)
    .addKey(degreeKey)
    .buildMixedIndex("search");

management.commit();

The goal is so that I can search for vertices of a specific type and order them using the degree. I believe the following should achieve that:

graph.traversal().V().has("TYPE", "person").order.by("DEGREE").limit(80);

However the above traversal is clearly not using the Index as I get the following error:

Could not execute query since pre-sorting requires fetching more than 1000000 elements. Consider rewriting the query to exploit sort orders

What's odd is that I have confirmed that elastic search can answer my query very quickly. Using the following query directly to Elasticsearch:

curl -XGET 'localhost:9200/titan/byTypeDegree/_search?size=80' -d '
{
    "sort" : [
        { "DEGREE" : {"order" : "desc"}}
    ],
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "must" : [
                 { "term" : {"TYPE" : "person"}} 
              ]
           }
         }
      }
   }
}

I get the results I need:

"hits": [

    "_index": "titan",
    "_type": "byTypeDegree",
    "_id": "izaqnk",
    "_score": null,
    "_source": {
      "TYPE": "http://mindmaps.io/person",
      "DEGREE": 140
    },
    "sort": [
      140
    ]
 },
 {
    "_index": "titan",
    "_type": "byTypeDegree",
    "_id": "8j5oxk",
    "_score": null,
    "_source": {
      "TYPE": "http://mindmaps.io/person",
      "DEGREE": 112
    },
    "sort": [
      112
    ]
 },
...

So why can't Titan execute the traversal using the index ? Am I incorrectly creating the index or is the traversal incorrect ?

boliza commented 8 years ago

@fppt your query should be

g.V().has("TYPE", textContains('person')).order().by("DEGREE",incr).limit(80);

if you want to use your query,your schema should be:

management.buildIndex("byTypeDegree", Vertex.class)
    .addKey(typeKey, Mapping.STRING.asParameter())
    .addKey(degreeKey, Mapping.STRING.asParameter())
    .buildMixedIndex("search");

because when you define your index ,do not special the Mapping ,so titan use the the default Mapping.DEFAULT.

where titan execute a query ,titan will check the query keys is support by Titan Index. see the code in ElasticSearchIndex line 967

if (AttributeUtil.isString(dataType)) {
     switch(mapping) {
           case DEFAULT:
           case TEXT:
               return titanPredicate == Text.CONTAINS || titanPredicate == Text.CONTAINS_PREFIX || titanPredicate == Text.CONTAINS_REGEX;
           case STRING:
               return titanPredicate == Cmp.EQUAL || titanPredicate==Cmp.NOT_EQUAL || titanPredicate==Text.REGEX || titanPredicate==Text.PREFIX;
           case TEXTSTRING:
               return (titanPredicate instanceof Text) || titanPredicate == Cmp.EQUAL || titanPredicate==Cmp.NOT_EQUAL;
     }

see more about index http://s3.thinkaurelius.com/docs/titan/1.0.0/index-parameters.html#text-search