opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
259 stars 191 forks source link

[BUG] Failure to process "reserved" chars in regular expressions #4510

Open michael-markevich opened 5 months ago

michael-markevich commented 5 months ago

Describe the bug Similar to https://github.com/opensearch-project/data-prepper/issues/3514, the regex parser fails on the example from documentation: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md#reference-table.

To Reproduce Steps to reproduce the behavior:

  1. Create a pipeline with the following configuration
log-pipeline:
  source:
    http:
      ssl: false

  processor:
    - parse_json:
        source: message
        parse_when: '/message=~"^\w*$"' # Fails
        # parse_when: '/message=~"^\w*\ $"' # Also fails
        # parse_when: '/message =~ "^(\\{.*\\}|\\[.*\\])$"' # Also fails

  sink:
    - opensearch:
        hosts: [ 'https://opensearch:9200' ]
        insecure: true
  1. Send in a log message (any).
  2. See the error log:
    2024-05-07T12:07:25,039 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@25c210a1]
    org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
    at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
    at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
    at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
    at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
    at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
    at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
    at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
    Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
    at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
    ... 12 more
    Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (5)
    |-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
    |-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
    |-- org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
    |-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
    |-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
    line 1:11 token recognition error at: '^'
    line 1:12 token recognition error at: '\'
    line 1:13 token recognition error at: 'w*'
    line 1:15 token recognition error at: '$"'
    line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
    2024-05-07T12:07:25,042 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@2c0f1eaf]
    org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
  3. If you escape the dollar sign, you still get an error:
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,189 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@686b21ea]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
    at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
    at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
    at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
    at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
    at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
    at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
    at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
    at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
    at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
    ... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (6)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,191 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@1c5deeb4]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
  1. Parsing also fails when checking if message is a JSON string or array with the following regex:

parse_when: '/message =~ "^(\{.\}|\[.\])$"'

Expected behavior Regex should be parsed correctly.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

kkondaka commented 5 months ago

Looks like a bug in expression grammar.

michael-markevich commented 5 months ago

Additional notes to the case:

  1. If I add a regular expression without { } $ (or some other special characters), it works perfectly fine. Our example was tested on a different parser and works there. As mentioned above, even the example from your documentation ("^\w*$") fails the test because of the dollar sign.
  2. This use case is quite important for us, because it helps to distinguish log messages with JSON structure from any other (syslog) messages, avoid parser errors and improve overall performance. Also, such behaviour is a standard feature in Graylog.