jbmusso / gremlin-javascript

JavaScript tools for graph processing in Node.js and the browser inspired by the Apache TinkerPop API
MIT License
214 stars 62 forks source link

Graph traversal object "g" initialization #21

Closed ghost closed 9 years ago

ghost commented 9 years ago

When using the client under node.js in session mode over the internet (forwarded through NAT), the gremlin client will not return stream results without creating the "g" graph traversal object before other queries. Specifically, for a cassandra and elasticsearch setup, I had to run:

client.stream('graph = TitanFactory.open(\'conf/titan-cassandra-es.properties\');g = graph.traversal()');

After doing so, I was able to run queries as normal. It may just be my specific setup, however it seems logical that the server would have no idea what "g" is without it being defined beforehand. If this is indeed a bug, I would recommend adding the above query before all other queries are run.

jbmusso commented 9 years ago

This could be related to your setup. Would you mind sharing the gremlin server .yaml config file you're using? cc @spmallette

spmallette commented 9 years ago

g can be constructed via gremlin server initialization scripts and should be available to in-session requests:

https://github.com/apache/incubator-tinkerpop/blob/3.0.1-incubating/gremlin-server/scripts/empty-sample.groovy#L40

ghost commented 9 years ago

@spmallette I modified my empty-sample.groovy to match the one you provided and restarted with bin/titan.sh start, however gremlin still did not recognize "g". My gremlin-server.yaml (from a fresh build completed just 5 minutes ago) is:

host: localhost
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
serializedResponseTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
graph: conf/gremlin-server/titan-berkeleyje-server.properties}
plugins:
- aurelius.titan
scriptEngines: {
gremlin-groovy: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI],
scripts: [scripts/empty-sample.groovy]},
nashorn: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI]}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { useMapperFromGraph: graph }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { useMapperFromGraph: graph }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { useMapperFromGraph: graph }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
dmill-bz commented 9 years ago

Can you run ./bin/titan.sh -v start and share the output from the gremlin-server? (the last part with all the [INFO])

ghost commented 9 years ago

0 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - \,,,/ (o o) -----oOOo-(3)-oOOo-----

254 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Configuring Gremlin Server from conf/gremlin-server/gremlin-server.yaml 382 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics ConsoleReporter configured with report interval=180000ms 385 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics CsvReporter configured with report interval=180000ms to fileName=/tmp/gremlin-server-metrics.csv 478 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics JmxReporter configured with domain= and agentId= 481 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics .2086 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration - Generated unique-instance-id=7f00010121609-pewbuntu1 2144 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Initiated backend operations thread pool of size 16 2291 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2015-11-10T22:11:03.390Z into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller@62010f5c 2292 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] was successfully configured via [conf/gremlin-server/titan-berkeleyje-server.properties]. 2292 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-* 3181 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded nashorn ScriptEngine 3644 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-groovy ScriptEngine .5042 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor - Initialized gremlin-groovy ScriptEngine with scripts/empty-sample.groovy 5043 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines. 5054 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardtitangraph[berkeleyje:db/berkeley], standard] 5079 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Executing start up LifeCycleHook 5102 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Executed once at startup of Gremlin Server. 5221 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0 5222 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0 5371 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 5373 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0 5471 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1. 5471 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182. . OK (connected to 127.0.0.1:8182).

spmallette commented 9 years ago

This is confusing because the original issue referred to Cassandra and Elasticsearch but this output indicates BerkeleyDB. Anyway, g should be present and available in this case as I see this output in the logs:

5054 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardtitangraph[berkeleyje:db/berkeley], standard]

I guess you ran this directly as @PommeVerte asked you to in which case i guess this would have started with berkeleydb as the default. when you start it this way is g available (like i said - it should be)?

ghost commented 9 years ago

That's very odd, because the titan.sh script indicates that it is forking Cassandra and Elasticsearch on start, and I can see both of their TCP ports open as well (7000 and 9300 are both bound to java processes under netstat). Running both gremlin.sh as well as the example script from the gremlin-javascript README fail to access g. Gremlin reports a "no such property" error, and the example script only logs "All results fetched", sans data. Since the graph is loaded with the Graph of the Gods example graph, it should have returned something else instead of just ending the query.

Additionally, I fixed the titan-berkelyje-server.properties reference and changed it to titan-cassandra-es.properties, which is what I have been using to create the "graph" object in gremlin. While gremlin again fails to recognize g, the example script threw this error:

events.js:141 throw er; // Unhandled 'error' event ^

Error: Error: No such property: g for class: Script2 (Error 597) at MessageStream. (/home/jpsamaroo/titan-rest/node_modules/gremlin-client/src/gremlinclient.js:265:26) at emitOne (events.js:77:13) at MessageStream.emit (events.js:169:7) at GremlinClient.handleMessage (/home/jpsamaroo/titan-rest/node_modules/gremlin-client/src/gremlinclient.js:84:21) at WebSocket.onMessage (/home/jpsamaroo/titan-rest/node_modules/gremlin-client/node_modules/ws/lib/WebSocket.js:418:14) at emitTwo (events.js:87:13) at WebSocket.emit (events.js:172:7) at Receiver.ontext (/home/jpsamaroo/titan-rest/node_modules/gremlin-client/node_modules/ws/lib/WebSocket.js:816:10) at /home/jpsamaroo/titan-rest/node_modules/gremlin-client/node_modules/ws/lib/Receiver.js:477:18 at Receiver.applyExtensions (/home/jpsamaroo/titan-rest/node_modules/gremlin-client/node_modules/ws/lib/Receiver.js:364:5)

dmill-bz commented 9 years ago

If memory serves me well the titan-cassandra-es.properties file is missing the following line :

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

It might've been added since I last checked but no harm in making sure it's there. You also say that it should load with the graph of the gods but how is this loaded? Maybe you can share that piece of info so we can rule out any issues on that end.

This is a shot in the dark as usually the errors are different when this happens, but on the off chance. Can you check to see if you don't have two instances of titan running? This happens when you do ./bin/titan.sh start twice in a row by mistake. The usual stop command doesn't kill all instances and you always have some left over processes that wreak havoc and need to be handled via kill.

At this point it might also be relevant for you to share your OS, java version and any other info that may be relevant in trying to reproduce this.

spmallette commented 9 years ago

That's very odd, because the titan.sh script indicates that it is forking Cassandra and Elasticsearch

Well - it's not too odd because you have this entry in your gremlin-server.yaml file:

graph: conf/gremlin-server/titan-berkeleyje-server.properties}

which seems to be what's in the pre-packaged file as the default:

https://github.com/thinkaurelius/titan/blob/1.0.0/titan-dist/src/assembly/static/conf/gremlin-server/gremlin-server.yaml#L9

So - you are using berkeleydb (despite the fact that cassandra/es are running with the script. That stinks. I'll fix that for next release.

, I fixed the titan-berkelyje-server.properties reference and changed it to titan-cassandra-es.properties,

that was the right thing to change.

spmallette commented 9 years ago

I just updated Titan as follows:

https://github.com/thinkaurelius/titan/commit/89c0a2b30e798a13e098949c219730b228bcc82a

using that config, my Graph and g get created and use cassandra+es. I confirmed this through the Gremlin Console:

gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Connected - localhost/127.0.0.1:8182
gremlin> :> g
==>graphtraversalsource[standardtitangraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> :> g.V().count()
==>0

@jpsamaroo can you please try to connect with the Gremlin Console in the fashion that I did above and validate that you achieve similar results?

ghost commented 9 years ago

I added the script that @PommeVerte advised, and it did not do anything, nor did the :remote command that @spmallette offered. Also, I confirmed that after termination, no other processes were holding open port 8182. However, after running bin/titan.sh -v start, I noticed an error on gremlin-server initialization indicating that "conf/gremlin-server/titan-cassandra-es.properties" could not be found. This is interesting because titan-cassandra-es.properties is located normally in /conf, not in /conf/gremlin-server. So I stopped Titan, moved the file to where it expected, and bang, it started without error. From there on, I used the :remote command from @spmallette and was able to access the traversal object "g" with ":> g". I'm not sure where that error is coming from, but it was clearly during startup with bin/titan.sh start.

dmill-bz commented 9 years ago

That's probably because you have it set to /conf/gremlin-server/titan-cassandra-es.properties instead of /conf/titan-cassandra-es.properties in your gremlin-server.yaml file.

spmallette commented 9 years ago

From there on, I used the :remote command from @spmallette and was able to access the traversal object "g" with ":> g".

So given that you can connect with :> that means Gremlin Server/Titan are happy. You didn't mention if gremlin-javascript is now happy connecting too....is that the case?

ghost commented 9 years ago

@PommeVerte that is probably the case, however it confuses me as to why that is the case, because this build is fresh from the git repo. I imagine that needs to get fixed on the repo itself, because I made no such modifications to that file on my end.

@spmallette good point, I ran the example script and it did just fine. I received about 12 vertices that I had loaded previously, and then an "All results fetched". Because this now seems to be resolved from the viewpoint of gremlin-javascript, I am going to close this issue.