mpollmeier / gremlin-scala

[unmaintained] Scala wrapper for Apache TinkerPop 3 Graph DSL
Apache License 2.0
482 stars 75 forks source link

Performance issues with Indices #28

Closed joan38 closed 10 years ago

joan38 commented 10 years ago

Hi,

I tried the 2 following query in Titan and Neo4j and it seems that it's not using the indices optimisation ~O(1) and staying in O(n): graph.V.has("dmdid", id).has("type", type).toStream() graph.query().has("dmdid", id).has("type", type).vertices().toStream

This is a major issue since we can't use gremlin-scala once we start to put some data in the graph, queries takes ages.

Cheers

mpollmeier commented 10 years ago

Hi Joan,

can you send me a complete setup so that i can reproduce the exact problem? Everthing else would be stepping in the dark.

Cheers Michael

On 11 June 2014 21:36:00 GMT+12:00, Joan notifications@github.com wrote:

Hi,

I tried the 2 following query in Titan and Neo4j and it seems that it's not using the indices optimisation ~O(1) and staying in O(n): graph.V.has("dmdid", id).has("type", type).toStream() graph.query().has("dmdid", id).has("type", type).vertices().toStream

This is a major issue since we can't use gremlin-scala once we start to put some data in the graph, queries takes ages.


Reply to this email directly or view it on GitHub: https://github.com/mpollmeier/gremlin-scala/issues/28

joan38 commented 10 years ago

Yes I will create a test project. But this seems to be an issue on Gremlin-Java because if I understand well graph.V.has("dmdid", id).has("type", type ).toStream() is the same as new GremlinPipeline(graph.getVertices()).has("dmdid", id).has("type", type).toList() And both are not using indices.

mpollmeier commented 10 years ago

Yup that sounds right - you might wanna post it directly on the mailing list, but I can have a quick look beforehand if you like.

Cheers Michael

On 12/06/14 08:56, Joan wrote:

Yes I will create a test project. But this seems to be an issue on Gremlin-Java because if I understand well graph.V.has("dmdid", id).has("type", type ).toStream() is the same as new GremlinPipeline(graph.getVertices()).has("dmdid", id).has("type", type).toList() And both are not using indices.

— Reply to this email directly or view it on GitHub https://github.com/mpollmeier/gremlin-scala/issues/28#issuecomment-45798721.

joan38 commented 10 years ago

Hi Michael,

Sorry for the delay, Devoxx UK oblige!

So I setup a project with some data that I use in Pharmaceutics: https://github.com/joan38/TestTitan

(You'll find also an other issue that I have with Titan about using an external indexing backend that doesn't work)

mpollmeier commented 10 years ago

Thanks Joan. This looks indeed like a titan specific question - once you know how to use indexes in GremlinPipeline the same applies for GremlinScala.

Some threads like this one do something similar to javaPipes1: gremlinPipeline.start(titanGraph.getVertices("name", "hercules")).out("father", "mother").property("type");

https://groups.google.com/forum/#!searchin/aureliusgraphs/GremlinPipeline$20index/aureliusgraphs/0zSs6SQACas/wTF8Aiim2vIJ

Would be good to know if that's the only option we have, then we can open up GremlinScala for that way. https://groups.google.com/forum/#!forum/aureliusgraphs

joan38 commented 10 years ago

Hi,

I think this is more a Gemlin-Java design issue than Titan because:

So I will post on the Aurelius group.

joan38 commented 10 years ago

Here is the discussion on the Aurelius group: https://groups.google.com/forum/#!topic/aureliusgraphs/zly7KtpIDBk

mpollmeier commented 10 years ago

Thanks for your effort Joan. I understand the problem and will try and get this fixed for the 2.5 stream. Note that this is not an issue in GremlinScala 3 (see tinkerpop3 branch), but that's only in alpha stage.

joan38 commented 10 years ago

Cheers Mickael! This is a pleasure to help.

Are you speaking about the side effect of this:

TinkerPop2 and below made a sharp distinction between the various TinkerPop projects: Blueprints, Pipes, Gremlin, Frames, Furnace, and Rexster.
With TinkerPop3, all of these projects have been merged and are generally known as Gremlin.
Blueprints → Gremlin Structure API : Pipes → GraphTraversal : Frames → Traversal : Furnace → GraphComputer and VertexProgram : Rexster → GremlinServer."

Source: http://www.tinkerpop.com/docs/current/

mpollmeier commented 10 years ago

This is the equivalent of Daniel and your javaPipe3 in GremlinScala 2.5: new GremlinScalaPipeline().start(graph).V

I wanted to try it with your TestTitan project but any of the scenarios run forever in this loop for some reason: 16:14:00.768 [run-main-0] DEBUG GraphUtils$ - 63 16:14:00.864 [run-main-0] DEBUG GraphUtils$ - 95 16:14:00.946 [run-main-0] DEBUG GraphUtils$ - 81

Let me know if this solves it. FYI GremlinScala 3 (for TP3) has been redesigned from ground up, and this problem should not occur.

joan38 commented 10 years ago

The following is not working:

import com.tinkerpop.gremlin.scala._
new GremlinScalaPipeline().start(graph).V

The V method is not found, I have V(graph: Graph) but nothing without parameter. Shall I give again the graph?

I see for the v3.

Cheers

mpollmeier commented 10 years ago

Mmh, this actually relied on some new stuff that I hadn't released back then - sorry about that. Just added some more sugar, now you can do this: GremlinScalaPipeline(graph).V

I just released 2.5.2 to sonatype, it should be synchronised to repo1 shortly: https://oss.sonatype.org/content/repositories/releases/com/michaelpollmeier/gremlin-scala_2.11/2.5.2/ https://oss.sonatype.org/content/repositories/releases/com/michaelpollmeier/gremlin-scala_2.10/2.5.2/

Please let me know if this works and used indexes.

joan38 commented 10 years ago

I updated the project with the new version and added the call like you introduced on scalaPipe2 and it's still not using indices.

The loop logging the below lines is because I'm logging every call to searchDmdVertex to show the elapsed time (in ms) to execute the query. 16:14:00.768 [run-main-0] DEBUG GraphUtils$ - 63 16:14:00.864 [run-main-0] DEBUG GraphUtils$ - 95 16:14:00.946 [run-main-0] DEBUG GraphUtils$ - 81 If you see that much you probably can see other lines saying that we are iterating across all vertex.

Cheers

mpollmeier commented 10 years ago

I added a convenience method to GremlinScala.fromElements to make this work. It's kinda ugly under the hood because there's a lot of casting magic and reflection under the hood of Gremlin 2.x which makes this really hard to understand. E.g. java.lang.Iterable is explicitly treated differently (Elements are being unwrapped), but there's no case that handles scala.lang.Iterable... All my hope lies on Tinkerpop 3 ;)

Anyway, I got it working and will be sending you a PR shortly. Note that GS 2.5.3 is released and will get synchronised to repo1 shortly.