opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
258 stars 188 forks source link

[FEATURE] Support various EventBridge messages in S3 source #3426

Open oraharsh opened 1 year ago

oraharsh commented 1 year ago

Is your feature request related to a problem?

Yes, S3 log pipeline – Listens to S3 Amazon SQS notifications generated via eventbridge and pulls data from S3 buckets. I am getting invalid body which cannot be parsed into S3EventNotification. Unrecognized field "version" (class org.opensearch.dataprepper.plugins.source.S3EventNotification), not marked as ignorable (one known property: "Records"]

What solution would you like?

I want any sqs event generated via s3-sqs,s3-sns or s3-eventbridge-sqs should be parsed in ingest pipeline source as s3.

What alternatives have you considered?

Right now I dont have any alternative

Do you have any additional context?

2023-09-16T11:58:29.321 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.parser.S3EventNotificationParser - SQS message with message ID:414afe99-8914-4fb4-b9ed-782c228a0ddb has invalid body which cannot be parsed into S3EventNotification. Unrecognized field "version" (class org.opensearch.dataprepper.plugins.source.S3EventNotification), not marked as ignorable (one known property: "Records"]) at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: org.opensearch.dataprepper.plugins.source.S3EventNotification["version"]).

ReinGrad commented 1 year ago

It looks like the issue you are facing is that the SQS messages generated by Amazon EventBridge contain an additional "version" field that the S3EventNotificationParser in opensearch-py is not expecting.

Since EventBridge uses a different message format than plain S3 notifications, the parser needs to be updated to handle the extra EventBridge-specific fields.

dlvenable commented 10 months ago

@oraharsh , Do you have an example input that we can use?

It seems from the description that there is a field version in the input model which is not part of the Data Prepper model. One way to help solve this would be to be more flexible in our model. Only require and expect what we know we need. Ignore any other fields.

Cihaan commented 3 months ago

hey @dlvenable @oraharsh, any update on this matter ? i am having the exact same issue on my pipeline

Cihaan commented 3 months ago

all good, it is mentioned in the docuementation.

there is a property called notification_source that should take either the value eventbridge for the eventbridge as the middleman or s3 for the event directly sent to sqs.

more detail here