tabular-io / iceberg-rest-image

Simple project to expose a catalog over REST using a Java catalog backend
Apache License 2.0
100 stars 40 forks source link

No s3 support for writing table metadata #8

Closed Cerebus closed 1 year ago

Cerebus commented 1 year ago

From trino, I can create the schema/namespace. On CREATE TABLE, writing the underlying files to s3 (actually, minio) works, but on committing the metadata, thows an exception:

2023-01-18T15:47:10.326 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request
org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=500, type=RuntimeIOException, message=Failed to get file system for path: s3://testdata/tinynation/metadata/00000-00b584bb-e702-4484-9968-92c0948f08ca.metadata.json)
org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path: s3://testdata/tinynation/metadata/00000-00b584bb-e702-4484-9968-92c0948f08ca.metadata.json
        at org.apache.iceberg.hadoop.Util.getFs(Util.java:54)
        at org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53)
        at org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:83)
        at org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:159)
        at org.apache.iceberg.jdbc.JdbcTableOperations.doCommit(JdbcTableOperations.java:105)
        at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
        at org.apache.iceberg.rest.CatalogHandlers.create(CatalogHandlers.java:315)
        at org.apache.iceberg.rest.CatalogHandlers.updateTable(CatalogHandlers.java:263)
        at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:343)
        at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384)
        at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
        at org.apache.iceberg.rest.RESTCatalogServlet.doPost(RESTCatalogServlet.java:78)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:713)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
        at org.apache.iceberg.hadoop.Util.getFs(Util.java:52)
        ... 42 more
        at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401)
        at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
        at org.apache.iceberg.rest.RESTCatalogServlet.doPost(RESTCatalogServlet.java:78)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:713)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:833)

I think this is a config issue, but I haven't been able to suss out what env vars I need to declare.

Cerebus commented 1 year ago

I figured out setting CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO and setting the normal AWS env vars, but now I'm getting

Query 20230119_195521_00016_7q5u8 failed: Server error: S3Exception: null (Service: S3, Status Code: 400, Request ID: 173BCDF32DE958DE)
Cerebus commented 1 year ago

NM, sorted that out. Closing.

danthelion commented 1 year ago

Hey @Cerebus, could you paste the final config you used to connect to MinIO?

retrry commented 1 year ago

@danthelion I needed to add these variables for it to work with non-aws S3 implementation:

      CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      CATALOG_S3_ENDPOINT=https://my-endpoint
      AWS_REGION=my-region
      AWS_SECRET_ACCESS_KEY=secret_key
      AWS_ACCESS_KEY_ID=access_key