tabular-io / iceberg-rest-image

Simple project to expose a catalog over REST using a Java catalog backend
Apache License 2.0
109 stars 45 forks source link

GCS support #36

Open animer3009 opened 1 year ago

animer3009 commented 1 year ago

Hi guys, Are you going to add GCS support? Any ETA?

CrawX commented 1 year ago

It's actually very simple to do, even for yourself if you need it right now: just add implementation "org.apache.iceberg:iceberg-gcp:${icebergVersion}" to build.gradle and build the image yourself.

animer3009 commented 1 year ago

Hi @CrawX , Thank for your replay. What about environment variables?

For s3 we have:

environment:
  - AWS_ACCESS_KEY_ID=admin
  - AWS_SECRET_ACCESS_KEY=password
  - AWS_REGION=us-east-1
  - CATALOG_WAREHOUSE=s3://warehouse/
  - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
  - CATALOG_S3_ENDPOINT=http://minio:9000

P.S. What exact command I need to use to build? Just gradle build?

animer3009 commented 1 year ago

Hi @CrawX , Looks like you missed my replay. Can you help please? :)

CrawX commented 1 year ago

I just added the mentioned dependency in build.gradle and then rebuild the image using docker build. You can check the Dockerfile on how this project is build to do that outside of docker.

I'm using it locally with fake-gcs-server, this is the env I'm setting

- CATALOG_WAREHOUSE=gs://warehouse/
- CATALOG_IO__IMPL=org.apache.iceberg.gcp.gcs.GCSFileIO
- CATALOG_GCS_SERVICE_HOST=http://gcs:4443

If you're actually using gcs, it will probably be different (auth etc). I suggest taking a look at GCPProperties.java.

animer3009 commented 1 year ago

Hi @CrawX , Thank you for your help! I did all stuff, seems it works because I am able create tables. But I have trouble with storing data/read from it. Getting error like:

scala> spark.sql("INSERT INTO prod.db.sample VALUES (1, 'John'), (2, 'Jane')") 23/07/26 23:48:48 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2) org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path: gs://warehouse-iceberg/prod/db/sample/data/00000-2-759b4512-1ef6-4a0a-be07-235ca0329324-00001.parquet

Here is my spark.conf:

spark.jars.packages=org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.0,org.apache.iceberg:iceberg-gcp:1.3.0 spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.defaultCatalog=rest_prod spark.sql.catalog.rest_prod=org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.rest_prod.type=rest spark.sql.catalog.rest_prod.uri=http://localhost:8181

It creates metadata in GCS but seems data folders are missing.

create log of rest API:

iceberg-rest | 2023-07-26T23:59:07.700 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request iceberg-rest | org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: prod.db.sample) iceberg-rest | org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: prod.db.sample iceberg-rest | at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:53) iceberg-rest | at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | 2023-07-26T23:59:07.715 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request iceberg-rest | org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: prod.db) iceberg-rest | org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: prod.db iceberg-rest | at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:53) iceberg-rest | at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | 2023-07-26T23:59:08.237 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table properties set at catalog level through catalog properties: {} iceberg-rest | 2023-07-26T23:59:08.239 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table properties enforced at catalog level through catalog properties: {} iceberg-rest | 2023-07-26T23:59:08.417 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Successfully committed to table prod.db.sample in 174 ms iceberg-rest | 2023-07-26T23:59:08.418 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Refreshing table metadata from new version: gs://warehouse-iceberg/prod/db/sample/metadata/00000-3e40b56b-aa8c-4b36-a8fa-f0de6368f487.metadata.json

insert log of rest API:

iceberg-rest | 2023-07-26T23:59:56.970 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Refreshing table metadata from new version: gs://warehouse-iceberg/prod/db/sample/metadata/00000-3e40b56b-aa8c-4b36-a8fa-f0de6368f487.metadata.json iceberg-rest | 2023-07-26T23:59:57.121 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table loaded by catalog: rest_backend.prod.db.sample

How can I solve this?

nastra commented 1 year ago

@animer3009 the NoSuchTableException, message=Table does not exist: prod.db error is not necessarily indicating that something went wrong and could be from a Catalog#tableExists() check. You'll see the same stack trace when running through the https://iceberg.apache.org/spark-quickstart/ example when creating the table. The important part is Successfully committed to table prod.db.sample, meaning that everything looks as it should during table creation.

However, Failed to get file system for path: gs://warehouse-iceberg/prod/db/sample/data/00000-2-759b4512-1ef6-4a0a-be07-235ca0329324-00001.parquet indicates that you're most likely missing GCS-related jars on the Spark side that understand the gs scheme.