projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
1.04k stars 130 forks source link

[ Docs ]: Missing arguments in Nessie GC documentation for "gc/expire" phase, causes "UnknownHostException" when using Minio as Data Lake. #9991

Open kishlay-kr opened 1 day ago

kishlay-kr commented 1 day ago

Issue description

Description

There is one important field missing in the documentation for Nessie GC "gc" arguments. This omission causes the expire phase of nessie-gc to fail with the error "Received an UnknownHostException when attempting to interact with a service".

Setup:

  1. Nessie version: 0.99.0
  2. Trino version: 459
  3. Minio version: RELEASE.2024-08-29T01-40-52Z

I am using Minio as the data lake for my local setup.

Steps to reproduce:

  1. Run expire phase of the nessie-gc tool using minio as the data-lake. java -jar nessie-gc.jar expire -<other args>
  2. Use the s3 arguments provided in the documentation.

Proposed fix:

Nessie-gc has some command arguments for expire phase specific to the data lake ->

For S3:
    - s3.access-key-id
    - s3.secret-access-key
    - s3.endpoint, if you use an S3 compatible object store like MinIO

Here, 1 more argument is needed s3.path-style-access=true

Without this argument the s3 data-lake endpoint generated by nessie-gc is aws compatible which in our setup gives "UnknownHostException".

snazy commented 1 day ago

Hi @kishlay-kr, do you want to provide a PR to add some more context information to the help message?

kishlay-kr commented 22 hours ago

Sure, will raise a PR for the same.