verdict-project / verdict

Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing
http://verdictdb.org
Apache License 2.0
248 stars 66 forks source link

[Presto] Missing REGEX functions #377

Closed commercial-hippie closed 5 years ago

commercial-hippie commented 5 years ago

https://prestosql.io/docs/current/functions/regexp.html

I see Verdict has the REGEXP_CONTAINS function added but not the Presto equivalent regexp_like.

Other missing functions I think:

REGEXP_EXTRACT
REGEXP_EXTRACT_ALL

I've tried adding them to the following files and compiling:

src/main/antlr4/org/verdictdb/parser/VerdictSQLLexer.g4
src/main/antlr4/org/verdictdb/parser/VerdictSQLParser.g4

But for some reason it didn't work.

commercial-hippie commented 5 years ago

When I add regexp_like('1a 2b 14m', '\\d+b') (copied from your unit tests) to a simple query and run it on the latest verdict jar compiled from the master branch I run into the following error:

Error running instance method java.lang.RuntimeException: syntax error occurred:no viable alternative at input 'regexp_like('1a 2b 14m', '\\d+b')'
    at org.verdictdb.sqlreader.VerdictDBErrorListener.syntaxError(VerdictDBErrorListener.java:35)
    at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:65)
    at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:564)
    at org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:308)
    at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:145)
    at org.verdictdb.parser.VerdictSQLParser.predicate(VerdictSQLParser.java:5708)
    at org.verdictdb.parser.VerdictSQLParser.search_condition_not(VerdictSQLParser.java:5263)
    at org.verdictdb.parser.VerdictSQLParser.search_condition_or(VerdictSQLParser.java:5200)
    at org.verdictdb.parser.VerdictSQLParser.search_condition(VerdictSQLParser.java:5140)
    at org.verdictdb.parser.VerdictSQLParser.query_specification(VerdictSQLParser.java:6041)
    at org.verdictdb.parser.VerdictSQLParser.query_expression(VerdictSQLParser.java:5753)
    at org.verdictdb.parser.VerdictSQLParser.select_statement(VerdictSQLParser.java:1836)
    at org.verdictdb.parser.VerdictSQLParser.verdict_statement(VerdictSQLParser.java:578)
    at org.verdictdb.coordinator.ExecutionContext.identifyQueryType(ExecutionContext.java:796)
    at org.verdictdb.coordinator.ExecutionContext.sql(ExecutionContext.java:152)
    at org.verdictdb.jdbc41.VerdictStatement.execute(VerdictStatement.java:107)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)

Any idea what might be causing this?

pyongjoo commented 5 years ago

@dongyoungy Can you take a look? I think we can update a pre-compiled jar as well as the one uploaded to maven repo.

dongyoungy commented 5 years ago

@commercial-hippie The test worked fine on my end. I have seen in the past that changes in our parser do not get compiled correctly when you simply run mvn package -DskipTests. You might want to try mvn clean package -DskipTests to build the jar file.

If it doesn't work, I just deployed the current master branch version of VerdictDB as v0.5.10-SNAPSHOT.

Could you try the released jar file: https://github.com/mozafari/verdictdb/releases/tag/v0.5.10-SNAPSHOT OR try the following maven dependency for VerdictDB?

<dependency>
  <groupId>org.verdictdb</groupId>
  <artifactId>verdictdb-core</artifactId>
  <version>0.5.10-SNAPSHOT</version>
</dependency>
commercial-hippie commented 5 years ago

Thanks for that. I just tried to use that latest jar, did a complete clean setup of the proxy I built to query verdict and it still failed unfortunately.

I'm running the following query just to try and get it to work:

SELECT * FROM uptime WHERE regexp_like('1a 2b 14m', '\\d+b')

uptime is a simple table, with a single string column and a single row.

I've tried a few other variations as well but it's not working unfortunately. Are there perhaps other tests I can try?

dongyoungy commented 5 years ago

@commercial-hippie Thanks for the info. I have created a pull request fixing this issue. The problem was the parser syntax missing REGEXP_LIKE in the list of possible predicates (which can be used in WHERE clause) as opposed to the list of functions.

Please try the fix in that PR if you want to use it before merging it into master and let me know if it still does not work. Thanks.

commercial-hippie commented 5 years ago

@dongyoungy thanks! Just did a test - it works as expected. Thanks!