opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
262 stars 195 forks source link

Support dynamically applying codecs in the S3 source #4709

Open graytaylor0 opened 3 months ago

graytaylor0 commented 3 months ago

Is your feature request related to a problem? Please describe. As a user with an S3 bucket that contains a mix of json and csv objects, I would like to use a single data prepper pipeline to process these objects based on the file extension, rather than having to create multiple pipelines

Describe the solution you'd like I would like a new codec automatic that dynamically checks the object extension to determine which codec to use. For example, when automatic is set, objects with the .csv extension would use csv codec, objects with .json extension would use the json codec, and so on.

shenkw1 commented 3 months ago

spoke with @dlvenable and I will be working on this

dlvenable commented 3 months ago

The initial implementation can support reading the extension or the Content-Type header from S3. I'd say use the Content-Type first, then extension second.

A future implementation might allow for creating rules based on S3 bucket paths.