opencypher / openCypher

Specification of the Cypher property graph query language
http://www.opencypher.org
Apache License 2.0
853 stars 150 forks source link

ANTLR4 grammar - JavaScript - Issues with quotes and comments #392

Open kenwebb opened 4 years ago

kenwebb commented 4 years ago

Hi,

I have downloaded the ANTLR4 grammar from:

https://www.opencypher.org/resources
https://s3.amazonaws.com/artifacts.opencypher.org/M14/Cypher.g4

I am able to process Cypher.g4 (JavaScript):

java -jar antlr-4.7-complete.jar -Dlanguage=JavaScript Cypher.g4

Antlr correctly generates five files:

 CypherLexer.js
 CypherLexer.tokens
 CypherListener.js
 CypherParser.js
 Cypher.tokens

My example web page (.html) includes the following JavaScript code:

  const antlr4 = require('xholon/lib/antlr4/index');
  const CypherLexer = require('xholon/lib/antlr4g/CypherLexer');
  const CypherParser = require('xholon/lib/antlr4g/CypherParser');
  const CypherListener = require('xholon/lib/antlr4g/CypherListener');
  const input = 'CREATE (fgh {ijk: 123.4})'; // OK
  const chars = new antlr4.InputStream(input);
  const lexer = new CypherLexer.CypherLexer(chars);
  const tokens  = new antlr4.CommonTokenStream(lexer);
  const parser = new CypherParser.CypherParser(tokens);
  const tree = parser.oC_Cypher();
  updateTree(tree); // this is where I run my own code

In this simple example CREATE (fgh {ijk: 123.4}), I then process the Cypher tree and get the results that I expect (it creates a new node in my application).

BUT, it fails to work with any Cypher statement that contains single quotes (ex: 'abc'), double quotes (ex: "def"), or comments (ex: // this is a comment). For example:

const input = "CREATE (fgh {ijk: 'This is some text.'})"; // error

or

const input = 'CREATE (fgh {ijk: "This is some text."})'; // error

ErrorListener.js (part of the ANTLR4 distribution) reports (in the browser console window):

line 1:18 token recognition error at: ''Th'
line 1:37 token recognition error at: ''}'
line 1:21 no viable alternative at input 'CREATE (fgh {ijk: is'

I have developed a temporary work-around by replacing content in Cypher.g4 with content from DOT.g4 (Graphviz dot language), which is included with the ANTLR4 distribution. This lets me handle comments and double-quotes in openCypher, and allows me to continue exploring whether or not I will be able to use openCypher.

I hope this description of the issues I have found will be helpful, Ken Webb

Mats-SX commented 4 years ago

Hello Ken and thanks for your report.

This is very odd. In our tests we are only using the Java target for testing the generated parser. It is able to parse your example queries just fine -- I wonder if there is some issue with JavaScript that we are not taking into account.

Which content edits did you perform in order to address the issue? I am not very familiar with JavaScript myself, so I don't know by heart if ', " or other characters require some specific treatment to be handled correctly (given that my hypothesis on JavaScript is correct).

All the best Mats

kenwebb commented 4 years ago

Hi Mats,

Thanks for your reply. I've uploaded a copy of my modified ANTLR grammar to: Cypher.g4 If you look at the History, you can see how my version differs from the official openCypher file.

The main points are:

  1. Instead of the original StringLiteral and EscapedChar, I have:
StringLiteral : '"' ( '\\"' | . )*? '"' ;
  1. Instead of the original Comment, I have:
COMMENT
   : '/*' .*? '*/' -> skip
   ;

LINE_COMMENT
   : '//' .*? '\r'? '\n' -> skip
   ;
  1. I don't handle single quotes yet, because I haven't had time to do it.

I can't really speculate on exactly why my version works. My knowledge of ANTLR is limited. I decided to try substituting content from the grammar for DOT because I know that that grammar worked for me.

I program in Java and JavaScript, often both at the same time. I can't think off-hand of any specific difference that might be relevant here. Both languages use the same single line and multi-line comment characters. Java strings are delimited by double quotes, while JavaScript allows matching single or double quotes. Cypher looks pretty much the same as JavaScript in terms of comments and String quotes.

Ken

Mats-SX commented 4 years ago

Hello @kenwebb and thanks for reaching back.

I will leave this topic here for now, but this is a useful point to pick up from when we next plan work on the openCypher grammar.

jacobfriedman commented 2 years ago

I'm also wondering what that problem was.