opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
251 stars 184 forks source link

[BUG] JSON parsing errors in raw processor #3357

Closed byte-lvasquez closed 10 months ago

byte-lvasquez commented 11 months ago

I launched a data-prepper pod on an EKS cluster to send traces from an otel-collector to an AWS managed OpenSearch.

The traces are being sent fine but the log on data-prepper keeps printing an error that seems to relate to JSON parsing on each record: 2023-09-18T19:48:33,795 [raw-pipeline-processor-worker-5-thread-1] DEBUG org.opensearch.dataprepper.pipeline.ProcessWorker - Pipeline Worker: Submitting 7 processed records to sinks line 1:0 token recognition error at: 'spanId' line 1:6 mismatched input '' expecting {Function, Integer, Float, Boolean, 'null', JsonPointer, EscapedJsonPointer, VariableIdentifier, String, NOT, '-', '(', '{', OTHER}

Steps to reproduce the behavior:

  1. Installed Istio as a service mesh.
  2. Installed otel-collector otel/opentelemetry-collector:0.85.0 with a zipkin receiver
  3. Configured Istio to send trace data to the otel-collector.
  4. Otel-collector has a pipeline with zipkin as a receiver and data-prepper as exporter.
  5. Data-prepper has a pipelines config as described in this project (https://github.com/opensearch-project/data-prepper/blob/main/examples/trace_analytics.yml) configured to the managed OpenSearch.

Expected behavior No error logs or a log that allows me to fix the problem.

Screenshots otel-collector config

Screenshot 2023-09-18 at 14 09 02

data-prepper config

Screenshot 2023-09-18 at 14 09 45

Environment (please complete the following information):

brigidspeed23 commented 11 months ago

We're having a very similar issue. Errors are being thrown, but logs are being processed correctly and pushed to OpenSearch.

image

Pipeline config as follows:

version: "2"
entry-pipeline:
  source:
    s3:
      notification_type: sqs
      compression: gzip
      codec:
        newline:
      sqs:
        queue_url: "https://<redacted>"
        visibility_timeout: "180s"
        maximum_messages: 5
      aws:
        region: "eu-central-1"
        sts_role_arn: "<redacted>"
  processor:
    # Determine the type of log based in s3 path
    - grok:
        match:
          s3/key: [ '%{WORD}\/%{WORD}\/%{DATA}\/%{DATA:s3_path_app}\/%{GREEDYDATA}' ]
    - parse_json:
        source: message
    - date:
        destination: "@timestamp"
        from_time_received: true
    - delete_entries:
        with_keys: [ "s3" ]
  sink:
    - opensearch:
        hosts: [ "<redacted>" ]
        aws:
          region: "eu-central-1"
          sts_role_arn: "<redacted>"
        index: "${s3_path_app}-%{yyyy.MM.dd}"
        dlq:
          s3:
            bucket: "<redacted>"
            key_path_prefix: "%{yyyy}/%{MM}/%{dd}/"
            region: "eu-central-1"
            sts_role_arn: "<redacted>"

Runtime: Data Prepper docker image: opensearchproject/data-prepper:2.4.0 Docker version 20.10.23, build 7155243 Linux 2 AMI: 4.14.322-246.539.amzn2.x86_64

Changing index: "${s3_path_app}-%{yyyy.MM.dd}" to index: "mylogs-%{yyyy.MM.dd}" resolves the issue, but we would like to keep the index name dynamic. Note: same issues occur if we store the index name in a field called index instead of s3_path_app.

Update: we've now worked out that this spammy error logging was contributing to a very high CPU load on our Data Prepper instances, far outweighing the CPU requirement for the ingestion itself (many millions of logs per hour each generating this error message). These errors will also prevent us pushing our Data Prepper logs to CloudWatch (the cost of these logs would be substantial).

lazarevdmitriy commented 11 months ago

We have the same issue.

Data-prepper throws errors to stdout:

line 1:0 token recognition error at: 'spanId'
line 1:6 mismatched input '<EOF>' expecting {Function, Integer, Float, Boolean, 'null', JsonPointer, EscapedJsonPointer, VariableIdentifier, String, NOT, '-', '(', '{', OTHER}

We use the default data-prepper config described here https://opensearch.org/docs/latest/data-prepper/common-use-cases/trace-analytics/ Applications generate traces deployed from opentelemetry-demo https://github.com/open-telemetry/opentelemetry-demo/blob/main/kubernetes/opentelemetry-demo.yaml

Runtime: Data Prepper image: opensearchproject/data-prepper:2 K8s version v1.26.6

rchelaru commented 11 months ago

does anyone found the problem? I encounter the same issue but I have a simple setup where all services were installed from debian packages. I configured the Opentelemetry Collector to send traces to Data-Prepper and from there to Opensearch. The Collector seems to send the data successfully but Data-Prepper cannot process is.

Collector:

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:

processors:
  batch:

exporters:
  debug:
    verbosity: detailed
  logging:
  otlp/data-prepper:
      endpoint: 0.0.0.0:21890
      tls:
        insecure: true
service:

  pipelines:

    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, logging, otlp/data-prepper]
        # exporters: [logging, otlphttp]

  extensions: [health_check, pprof, zpages]

Data-Prepper:

entry-pipeline:
  delay: "100"
  source:
    otel_trace_source:
      ssl: false
  processor:
    - otel_trace_raw:
  sink:
    - opensearch:
        hosts: [ "http://localhost:9200" ]
        username: admin
        password: admin
        index_type: trace-analytics-raw

I am using this command (or Postman) to send some data: curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @span.json Where span.json contains:

{
 "resourceSpans": [
   {
     "resource": {
       "attributes": [
         {
           "key": "service.name",
           "value": {
             "stringValue": "test-with-curl222"
           }
         }
       ]
     },
     "scopeSpans": [
       {
         "scope": {
           "name": "manual-test"
         },
         "spans": [
           {
             "traceId": "71699b6fe85982c8c8995ea3d9c95df2",
             "spanId": "3c191d03fa8be066",
             "name": "spanitron",
             "kind": 2,
             "droppedAttributesCount": 0,
             "events": [],
             "droppedEventsCount": 0,
             "status": {
               "code": 1
             }
           }
         ]
       }
     ]
   }
 ]
}

I have also tried other examples such as:

{
  "trace_id": "7bba9f33312b3dbb8b2c2c62bb7abe2d",
  "parent_id": "",
  "span_id": "086e83747d0e381e",
  "name": "/v1/sys/health",
  "trace_state": "okish",
  "kind": "whatkind",
  "start_time": "2021-10-22 16:04:01.209458162 +0000 UTC",
  "end_time": "2021-10-22 16:04:01.209514132 +0000 UTC",
  "attributes": {
    "net.transport": "IP.TCP",
    "net.peer.ip": "172.17.0.1",
    "net.peer.port": "51820",
    "net.host.ip": "10.177.2.152",
    "net.host.port": "26040",
    "http.method": "GET",
    "http.target": "/v1/sys/health",
    "http.server_name": "mortar-gateway",
    "http.route": "/v1/sys/health",
    "http.user_agent": "Consul Health Check",
    "http.scheme": "http",
    "http.host": "10.177.2.152:26040",
    "http.flavor": "1.1"
  },
   "links": [{
           "link1": "http;://localhost",
           "link2": "http://localhjgdsf"
          }],
  "trace_group": "myTEstGroup",
  "duration_in_nanos": 123456,
  "trace_group_fields": {
          "end_time": "2021-10-22 16:04:01.209514132 +0000 UTC",
          "status_code": "ok",
          "duration_in_nanos": 31221
  },
  "service_name": "my servbice name",
  "events": [{
      "name": "",
      "message": "OK",
      "timestamp": "2021-10-22 16:04:01.209512872 +0000 UTC"
    }]
}

The error I receive in data-prepper:


line 1:0 token recognition error at: 'spanId'
line 1:6 mismatched input '<EOF>' expecting {Function, Integer, Float, Boolean, 'null', JsonPointer, EscapedJsonPointer, VariableIdentifier, String, NOT, '-', '(', '{', OTHER}
piermotte commented 11 months ago

I got the same error in data-prepper logs

line 1:0 token recognition error at: 'spanId' line 1:6 mismatched input '<EOF>' expecting {Function, Integer, Float, Boolean, 'null', JsonPointer, EscapedJsonPointer, VariableIdentifier, String, NOT, '-', '(', '{', OTHER}

My config:

pipelines.yaml: |
    entry-pipeline:
      delay: "100"
      source:
        otel_trace_source:
          ssl: false
      buffer:
        bounded_blocking:
          buffer_size: 10240
          batch_size: 160
      sink:
        - pipeline:
            name: "raw-pipeline"
        - pipeline:
            name: "service-map-pipeline"
    raw-pipeline:
      source:
        pipeline:
          name: "entry-pipeline"
      buffer:
        bounded_blocking:
          buffer_size: 10240
          batch_size: 160
      processor:
        - otel_trace_raw:
        - otel_trace_group:
            hosts: [ "https://${host}:${port}" ]
            insecure: true 
            username: ${username}
            password: "${password}"
      sink:
        - opensearch:
            hosts: ["https://${host}:${port}"]
            insecure: true
            username: ${username}
            password: "${password}"
            index_type: trace-analytics-raw
    service-map-pipeline:
      delay: "100"
      source:
        pipeline:
          name: "entry-pipeline"
      buffer:
        bounded_blocking:
          buffer_size: 10240
          batch_size: 160
      processor:
        - service_map_stateful:
      sink:
        - opensearch:
            hosts: ["https://${host}:${port}"]
            insecure: true
            username: ${username}
            password: "${password}"
            index_type: trace-analytics-service-map
topikachu commented 10 months ago

This is a regression of https://github.com/opensearch-project/data-prepper/commit/1ede4b611db28f027a2808d7db4498bedefa4318 Below is the stack image

graytaylor0 commented 10 months ago

Will get this fixed by changing the logging behavior of this ANTLR class used here(https://github.com/opensearch-project/data-prepper/blob/c63dea39545dc408d0f7f173cfbd98a6ffab0969/data-prepper-expression/src/main/java/org/opensearch/dataprepper/expression/ParseTreeParser.java#L34) when the isValidExpressionStatement is called. We can backport it for 2.5.1 patch release.