treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.46k stars 355 forks source link

Support EMRFS as underlying filesystem for lakeFSFS #2263

Closed arielshaqed closed 1 year ago

arielshaqed commented 3 years ago

To get EMRFS to run:

  1. Use EMR :shrug:.
  2. Use "S3" protocol URLs (not "S3A").

To verify that it worked:

  1. Enable S3 access logs
  2. EMRFS should use a distinctive User-Agent header.

The EMR OutputCommitter might not run (indeed, we probably hope that it doesn't). So verify that performance is reasonable by creating a Parquet file with several 10K partitions.

(Related: #1971)

github-actions[bot] commented 1 year ago

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.

github-actions[bot] commented 1 year ago

Closing this issue because it has been stale for 7 days with no activity.