opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
238 stars 176 forks source link

Remove Hadoop dependencies #4612

Open dlvenable opened 3 weeks ago

dlvenable commented 3 weeks ago

Is your feature request related to a problem? Please describe.

Data Prepper is currently pulling in Hadoop dependencies. These add some CVEs and many other dependencies that we may not need.

Hadoop is mostly (exclusively?) used for Parquet support.

Describe the solution you'd like

Remove Hadoop dependencies while still supporting Parquet

dlvenable commented 3 weeks ago

The Parquet project itself depends heavily upon Hadoop.

https://issues.apache.org/jira/browse/PARQUET-1126