orientechnologies / orientdb-gremlin

TinkerPop3 Graph Structure Implementation for OrientDB
Apache License 2.0
91 stars 32 forks source link

Vertices "disappearing" after indexing on T.label #108

Open mikolak-net opened 7 years ago

mikolak-net commented 7 years ago

In relation to the gitter conversation, it appears creating an index on T.label causes vertices to "disappear" from traversals on T.label.

This is relevant when e.g. using gremlin-scala to traverse all vertices of a given class.

Example code for reproducing the problem in below:

import com.orientechnologies.orient.core.metadata.schema.OType
import org.apache.commons.configuration.BaseConfiguration
import org.apache.tinkerpop.gremlin.orientdb.OrientGraphFactory
import org.apache.tinkerpop.gremlin.structure.T

object VerticesBGoneApp extends App {
  val TestLabel = "Test"

  val g = new OrientGraphFactory("plocal:./testdb").getTx

  val vertex = g.addVertex(T.label, TestLabel)

  println(g.database().browseClass(s"v_$TestLabel").next) //prints raw document
  println(g.traversal().V().has(T.label, TestLabel).toList) //prints the labeled vertex

  createIndexOnLabel(TestLabel)

  println(g.database().browseClass(s"v_$TestLabel").next) //raw document still here
  println(g.traversal().V().toList) //this is also non-empty
  println(g.traversal().V().has(T.label, TestLabel).toList) //but *this* is now empty

  private def createIndexOnLabel(className: String) = {
    val key = T.label.getAccessor
    val config = new BaseConfiguration()
    config.setProperty("keytype", OType.STRING)
    if (!g.getVertexIndexedKeys(className).contains()) {
      g.createVertexIndex(key, TestLabel, config)
    }
  }
}
mpollmeier commented 7 years ago

hey, just to let you know, I've reproduced this and can confirm it's a problem. i guess it's to do with the way we lookup values in indexes...

next steps: we'll need to add a test to https://github.com/mpollmeier/orientdb-gremlin/blob/master/driver/src/test/java/org/apache/tinkerpop/gremlin/orientdb/OrientGraphIndexTest.java and fix it, probably in OrientGraphStep.findIndex

will have a look once i get some time. if you are trying out things yourself just update this thread

mpollmeier commented 7 years ago

ok, it's actually not a bug. you're using indexes the wrong way :) an index purely on a label doesn't make sense - it's in it's own class, so there's basically an automatic index on it. Indexes only make sense if you use them on a specific key, see some use cases in If you look at https://github.com/mpollmeier/orientdb-gremlin/blob/master/driver/src/test/java/org/apache/tinkerpop/gremlin/orientdb/OrientGraphIndexTest.java

I've modified your example so that it works. It will only make sense though if you actually add a key/value and an index for a specific key.

import com.orientechnologies.orient.core.metadata.schema.{ OClass, OType }
import org.apache.commons.configuration.BaseConfiguration
import org.apache.tinkerpop.gremlin.orientdb.OrientGraphFactory
import org.apache.tinkerpop.gremlin.structure.T

object VerticesBGoneApp extends App {
  val TestLabel = "Test"
  val TestKey = "SomeKey"

  val g = new OrientGraphFactory("memory:testdb").getTx

  val vertex = g.addVertex(T.label, TestLabel)

  println("XXXXXXXXXXXXXX")
  println(g.database().browseClass(s"v_$TestLabel").next) //prints raw document
  println(g.traversal().V().has(T.label, TestLabel).toList) //prints the labeled vertex
  println("XXXXXXXXXXXXXX")

  createIndexOnLabel(TestKey, TestLabel)

  println("YYYYYYYYYYYYYYYYYYYY")
  println(g.database().browseClass(s"v_$TestLabel").next) //raw document still here
  println(g.traversal().V().toList) //this is also non-empty
  println(g.traversal().V().has(T.label, TestLabel).toList) //vertex still there :)
  println("YYYYYYYYYYYYYYYYYYYY")

  private def createIndexOnLabel(className: String, key: String) = {
    val config = new BaseConfiguration()
    config.setProperty("type", OClass.INDEX_TYPE.UNIQUE.name());
    config.setProperty("keytype", OType.STRING)
    g.createVertexIndex(key, TestLabel, config);
  }
}
mikolak-net commented 7 years ago

@mpollmeier : thanks for the quick reaction. Your explanation makes complete sense. In fact, this was how I originally was creating the index (and furthermore I based that original reasoning partially on OrientGraphIndexTest :) ).

However, this way the index is not picked up for the has(T.label, TestLabel) traversal. You can see it for yourself when running the amended version, as it will output:

WARNING: $ANSI{green {db=testdb}} scanning through all elements without using an index for Traversal [OrientGraphStep(vertex,[~label.eq(Test)])]

The original version of the test code produces no such warning (there's only the one for the complete vertex traversal, but that is to be expected).

Perhaps the problem is not with indexing, but the traversal strategy?

mpollmeier commented 7 years ago

if you just use has(T.label, TestLabel) then you can ignore that warning - it wouldn't make sense to use any index since you get all vertices from that class anyway. same as a full table scan in a RDBMS. So you can ignore the warning in this case. Only if you further filter on a specific key should you make sure that you've got an index.

mikolak-net commented 7 years ago

Yes, the full table traversal analogy is pretty much obvious. And although the "disappearing" vertex problem when indexing still exists, it looks like it's more relevant to OrientDB itself.

One final thing to clarify - so, basically the warning is superfluous in this case, right? Is there enough information provided to the TraversalStrategy in order to avoid printing it out in this case?

mpollmeier commented 7 years ago

Yup the warning is superfluous and could be avoided quite easily.

mikolak-net commented 7 years ago

OK, how would the avoidance strategy look like? A check on the graph features, something like that?

mpollmeier commented 7 years ago

The function that looks up indexes is OrientGraphStep.findIndex. It has access to all steps in the traversal and at the moment just iterates through the hasContainers. See https://github.com/orientechnologies/orientdb-gremlin/blob/master/driver/src/main/java/org/apache/tinkerpop/gremlin/orientdb/traversal/step/sideEffect/OrientGraphStep.java#L157

One way to achieve what you want would be to check if this is a simple traversal where no index would help anyway, e.g. graph.V.hasLabel, i.e. no hasContainer present.