opensearch-project / data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
264 stars 203 forks source link

Remove Hadoop dependencies #4612

Open dlvenable opened 5 months ago

dlvenable commented 5 months ago

Is your feature request related to a problem? Please describe.

Data Prepper is currently pulling in Hadoop dependencies. These add some CVEs and many other dependencies that we may not need.

Hadoop is mostly (exclusively?) used for Parquet support.

Describe the solution you'd like

Remove Hadoop dependencies while still supporting Parquet

dlvenable commented 5 months ago

The Parquet project itself depends heavily upon Hadoop.

https://issues.apache.org/jira/browse/PARQUET-1126