opencypher / cypher-for-gremlin

Cypher for Gremlin adds Cypher support to any Gremlin graph database.
Apache License 2.0
357 stars 48 forks source link

Question: translation result cannot run on Janus Graph #305

Open yuzhebj opened 5 years ago

yuzhebj commented 5 years ago

First thanks for your contributions for this useful tool.

I translated cypher with parameters to gremlin script using this tool.

Cypher: MATCH (arch:Architecture{_id: {archId}}) RETURN arch AS arch

Translated gremlin script: g.V().as('arch').hasLabel('Architecture').where(__.choose(__.constant(archId), __.constant(archId), __.constant(' cypher.null')).is(neq(' cypher.null')).as(' GENERATED1').select('arch').values('_id').where(eq(' GENERATED1'))).select('arch').project('arch').by(__.choose(neq(' cypher.null'), __.valueMap(true)))

This gremlin script can run successfully on Neo4j. But after I switched db to JanusGraph, I cannot get result and got this error. image

I commit the gremlin script using following code in JavaScript. client.submit(gremlinScript, params)

And I also created index for _id property. image

Could anyone help with this issue? And what is the best practice for this kind of requirements?

dwitry commented 5 years ago

Hello @yuzhebj,

this behavior is related to parameter handling.

To avoid NullPointerException in case parameter has null value, parameter is wrapped into null guard .choose(constant(archId), constant(archId), constant(' cypher.null')). Thats why Gremlin can not fold .has('_id', ...) in .V() step to benefit from index and query is slow.

I agree that this translation is not optimal (although correct) and parameter validation should be configurable.

Until this is implemented, I can only suggest trying a workaround with manual inlining of parameters e.g: MATCH (arch:Architecture{_id: 2}) RETURN arch AS arch.

yuzhebj commented 5 years ago

Hello @dwitry , thanks very much for your reply.

I remove the parameters, and it works. Also the gremlin script is much shorter than before. 👍

But I still wonder whether there is some static variable or parameter can replace the 'cypher.null' in gremlin?

In addition, in my translation result, I have script like this by(__.constant('_id')).select(values).map(cypherContainerIndex())

and get error image

I cannot find any document about it. How can I rewrite this gremlin script? (I have refined my cypher script in several different ways, but still have this cypherContainerIndex() function in gremlin result)

dwitry commented 5 years ago

But I still wonder whether there is some static variable or parameter can replace the 'cypher.null' in gremlin?

Not at the moment. If you will use Java null, the traversal will fail with NPE. This might be improved in TinkerPop 3.5.

(I have refined my cypher script in several different ways, but still have this cypherContainerIndex() function in gremlin result

cypherContainerIndex() is custom function that allows accessing elements by index when type is unknown (refer to docs for more detailed explanation).

It is included in Extensions and Server Plugin.

Have you installed Server Plugin on your target Gremlin Server?

dwitry commented 5 years ago

@yuzhebj can you show Cypher query that translates with cypherContainerIndex() and produces this error?

yuzhebj commented 5 years ago

@dwitry I do have installed server plugin on my gremlin server for this extension.

I use java project to do the translation task offline (I have added CustomPredicate.java and CustomFunctions.java to my project, version of gremlin translation library is 0.9.13, because the latest version 1.0.2 has some feature not supported in JanusGraph).

The version of gremlin server is 3.4.1. I have installed the extension by bin/gremlin-server.sh install org.opencypher.gremlin cypher-gremlin-server-plugin

And I execute gremlin script in a nodejs project.

This is the cypher: MATCH (instance:TEST {_id: 'instanceId_PARAM'}) OPTIONAL MATCH (instance)-[rel *]->(node) WITH *, CASE WHEN node IS NOT NULL THEN {start: startNode(LAST(rel)), end: endNode(LAST(rel)), type: type(LAST(rel))} ELSE NULL END AS relTmp RETURN COLLECT(DISTINCT {start: relTmp.start._id, end: relTmp.end._id, type: relTmp.type}) AS rels

And this is the translation result: g.V().as('instance').hasLabel('TEST').has('_id', eq('instanceId_PARAM')).choose(__.select('instance').as(' cypher.path.start.GENERATED3').repeat(__.outE().as('rel').inV()).emit().until(__.path().from(' cypher.path.start.GENERATED3').count(local).is(gte(21))).as('node').optional(__.select(all, 'rel').as('rel')), __.select('instance').as(' cypher.path.start.GENERATED3').repeat(__.outE().as('rel').inV()).emit().until(__.path().from(' cypher.path.start.GENERATED3').count(local).is(gte(21))).as('node').optional(__.select(all, 'rel').as('rel')), __.constant(' cypher.null').as('rel').as('node')).where(__.choose(__.select('node').choose(__.is(neq(' cypher.null')), __.constant(true), __.constant(false)).is(eq(true)), __.project('start', 'end', 'type').by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.outV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.inV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.label().is(neq('vertex')))), __.constant(' cypher.null')).is(neq(' cypher.null'))).select('instance', 'node', 'rel').project('instance', 'node', 'rel', 'relTmp').by(__.select('instance')).by(__.select('node')).by(__.select('rel')).by(__.choose(__.select('node').choose(__.is(neq(' cypher.null')), __.constant(true), __.constant(false)).is(eq(true)), __.project('start', 'end', 'type').by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.outV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.inV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.label().is(neq('vertex')))), __.constant(' cypher.null'))).as(' GENERATED4').select('instance').as('instance').select(' GENERATED4').select('node').as('node').select(' GENERATED4').select('rel').as('rel').select(' GENERATED4').select('relTmp').as('relTmp').project('start', 'end', 'type').by(__.project(' GENERATED5', ' GENERATED6').by(__.project(' GENERATED7', ' GENERATED8').by(__.identity()).by(__.constant('start')).select(values).map(cypherContainerIndex())).by(__.constant('_id')).select(values).map(cypherContainerIndex())).by(__.project(' GENERATED9', ' GENERATED10').by(__.project(' GENERATED11', ' GENERATED12').by(__.identity()).by(__.constant('end')).select(values).map(cypherContainerIndex())).by(__.constant('_id')).select(values).map(cypherContainerIndex())).by(__.project(' GENERATED13', ' GENERATED14').by(__.identity()).by(__.constant('type')).select(values).map(cypherContainerIndex())).dedup().is(neq(' cypher.null')).fold().project('rels').by(__.identity())

dwitry commented 5 years ago

version of gremlin translation library is 0.9.13, because the latest version 1.0.2 has some feature not supported in JanusGraph)

Just to note that JanusGraph 0.4.0, is compatible with 1.0.0 and even 1.0.2 if you don't need Spark and OLAP capabilities.

And this is the translation result:

So cypherContainerIndex() index comes from relTmp.start._id: frontend does not know the type of relTmp so extraction of start is done in runtime via custom function. Same goes for extracting _id from start.

Considering you've installed the plugin, error message about missing signature of cypherContainerIndex() looks very strange. Does any other custom functions like toInteger or toString work?

If not, may I suggest you to try to import CustomFunctions explicitly in Gremlin Server configuration:

scriptEngines: {
  gremlin-groovy: {
    plugins: {  ...  },
    staticImports: ['org.opencypher.gremlin.traversal.CustomFunctions.*']
...
yuzhebj commented 5 years ago

@dwitry Thanks for your reply.

I updated Gremlin Server configuration like what you suggested and restart the gremlin server. But I still have this error... I also tried to install the server plugin in the gremlin server in JanusGraph 0.4.0. And also failed...

I also followed the manual steps and I can see [INFO] OpLoader - Adding the cypher OpProcessor. after started gremlin-server... Is there any other suggestion?

dwitry commented 5 years ago

@yuzhebj can you clarify if any other custom functions like toInteger or toString work?

yuzhebj commented 5 years ago

@dwitry I tried toString function, and it does NOT work via both gremlin console and nodejs client.

The cypher is MATCH (arch:Architecture {_id:'archId'}) RETURN toString(arch.number)

And the gremlin script is g.V().hasLabel('Architecture').has('_id', eq('archId')).project('toString(arch.number)').by(__.choose(neq(' cypher.null'), __.choose(__.values('number'), __.values('number'), __.constant(' cypher.null'))).map(cypherToString()))

I also got this error image

I followed this guide to install the gremlin extension (both the automated and manual installation parts)

dwitry commented 5 years ago

Ok, still sounds very strange to me, need more details to find cause. Only thing I can suggest is that you post your Gremlin Server configuration yaml here, and I'll try to reproduce this on Monday.

yuzhebj commented 5 years ago

@dwitry this is my gremlin server config yaml. And thanks very much! gremlin-server-yaml.zip

dwitry commented 5 years ago

@yuzhebj could you try to add org.opencypher.gremlin.server.jsr223.CypherPlugin: {} to scriptEngines.gremlin-groovy.plugins in your yaml file?

yuzhebj commented 5 years ago

@dwitry I think it works. I will try more gremlin scripts. Thanks so much! 👍👍

sbespalov commented 4 years ago

@dwitry regarding this comment is it possible to disable parameter validation in current v1.0.4 release? Could the Translator.ParametrizedFlavorBuilder.inlineParameters() be useful here?

sbespalov commented 4 years ago

Also, regarding the inlining parameters workaround, this make possible SQL injection vulnerability