thinkaurelius / titan

Distributed Graph Database
http://titandb.io
Apache License 2.0
5.25k stars 1.01k forks source link

ES mixed Index doesn't update (TP 3.1.0) #1181

Open dmill-bz opened 8 years ago

dmill-bz commented 8 years ago

This was done against a build of the tp3-ci-31 branch while using the titan-cassandra-es.properties configuration file.

When you set a property value it gets indexed properly and can be found with a direct index search. But if you update a property it will not propagate the changes. Here's an example in the console:

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Connected - localhost/127.0.0.1:8182
gremlin> :> graphT.tx().rollback();mgmt=graphT.openManagement();schemaLabel = mgmt.makeVertexLabel('someLabel').make();TempIndex1 = mgmt.makePropertyKey('content').dataType(String.class).make();custIndex = mgmt.buildIndex('contentOnSomeLabel',Vertex.class).addKey(TempIndex1, com.thinkaurelius.titan.core.schema.Parameter.of("mapping",Mapping.TEXT)).indexOnly(schemaLabel).buildMixedIndex("search");mgmt.commit()
==>null
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:*").vertices().size()
==>0
gremlin> :> graphT.addVertex(label, "someLabel", "content", "someKeyWord")
==>v[4096]
gremlin> :> t.V(4096).valueMap()
==>{content=[someKeyWord]}
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:*").vertices().size()
==>1
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:someKeyWord").vertices().size()
==>1
gremlin> :> t.V(4096).property("content", "aNewKeyword")
==>v[4096]
gremlin> :> t.V(4096).valueMap()
==>{content=[aNewKeyword]}
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:aNewKeyWord").vertices().size()
==>0
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:someKeyWord").vertices().size()
==>1

If this was covered in another branch please let me know. I haven't really had a look around.

dalaro commented 8 years ago

What was the ES refresh interval? Was it allowed to elapse between modifying the property and querying it through ES, or equivalently, was a refresh manually forced out-of-band?

spmallette commented 8 years ago

@PommeVerte any update on this one?

dmill-bz commented 8 years ago

Hey sorry for the late reply.

I've tried this again on titan11 and it seems to work correctly :

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Connected - localhost/127.0.0.1:8182
gremlin> :> graphT.tx().rollback();mgmt=graphT.openManagement();schemaLabel = mgmt.makeVertexLabel('someLabel').make();TempIndex1 = mgmt.makePropertyKey('content').dataType(String.class).make();custIndex = mgmt.buildIndex('contentOnSomeLabel',Vertex.class).addKey(TempIndex1, com.thinkaurelius.titan.core.schema.Parameter.of("mapping",Mapping.TEXT)).indexOnly(schemaLabel).buildMixedIndex("search");mgmt.commit()
==>null
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:*").vertices().size()
==>0
gremlin> :> graphT.addVertex(label, "someLabel", "content", "someKeyWord")
==>v[4200]
gremlin> :> t.V(4200).valueMap()
==>{content=[someKeyWord]}
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:*").vertices().size()
==>1
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:someKeyWord").vertices().size()
==>1
gremlin> :> t.V(4200).property("content", "aNewKeyword")
==>v[4200]
gremlin> :> t.V(4200).valueMap()
==>{content=[aNewKeyword]}
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:aNewKeyWord").vertices().size()
==>1
gremlin> :> graphT.indexQuery("contentOnSomeLabel", "v.content:someKeyWord").vertices().size()
==>0

I'm going to re-enable some of my tests and if all is well I'll close this.

dhly-etc commented 8 years ago

This problem is still present in the titan11 branch. The errors generated are due to ES 1.5 disabling dynamic scripting for Groovy by default. Looks like this configuration change was overlooked in the migration from ES 1.2 to 1.5 in titan09. I believe the change actually happened in ES 1.4.3.

Sandboxing Groovy scripts by adding script.groovy.sandbox.enabled: true to conf/es/elasticsearch.yml seems to solve the issue, but I'm not sure about potential security ramifications. The configuration changed again in ES 1.6+, so future migrations need to pay attention to this as well.