opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
239 stars 177 forks source link

[BUG] [S3 source] Pause in SQS processing when there is an issue in reading S3 object #4569

Open hshardeesi opened 1 month ago

hshardeesi commented 1 month ago

Describe the bug Data-prepper S3 source pauses SQS processing with exponential backoff when there is an issue in reading S3 object such as corrupted parquet file.

To Reproduce Steps to reproduce the behavior:

  1. Creat a data-prepper pipeline with S3 source
  2. upload corrupted s3 objects to the bucket
  3. Observe data-prepper logs with message "Pausing SQS processing for XXX seconds due to an error in processing."

Expected behavior S3 source plugin should skip corrupted objects and process next object without delay. S3 source should backoff only when there is an error with the SQS processing itself.

Additional context Add any other context about the problem here.

[s3-source-sqs-2] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: Required field 'num_rows' was not found in serialized data! Struct: org.apache.parquet.format.FileMetaData$FileMetaDataStandardScheme@62202421. Retrying with exponential backoff.
[s3-source-sqs-2] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Pausing SQS processing for 19.858 seconds due to an error in processing.
dlvenable commented 1 month ago

We may also want to pause on authentication errors from S3. This can help if the user has permissions to SQS, but not S3.

dlvenable commented 1 month ago

Also, we should keep the stack trace when we get unknown errors. It is nice to cut it out for authentication errors. But, when an unknown error occurs, that stack trace helps debug.