projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
967 stars 123 forks source link

[Bug]: Rest API is missing location property in namespace metadata for StarRocks #9014

Closed sebastienPinel closed 1 week ago

sebastienPinel commented 1 month ago

What happened

I was experimenting with StarRocks and trying to use Nessie Rest API for Catalog. When trying to request to a specific DB (namespace), I have the following stacktrace:

java.lang.NullPointerException: Database pure doesn't exist location
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921) ~[spark-dpp-1.0.0.jar:?]
        at com.starrocks.connector.iceberg.rest.IcebergRESTCatalog.getDB(IcebergRESTCatalog.java:160) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.CachingIcebergCatalog.getDB(CachingIcebergCatalog.java:126) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.IcebergMetadata.getDb(IcebergMetadata.java:219) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.CatalogConnectorMetadata.getDb(CatalogConnectorMetadata.java:215) ~[starrocks-fe.jar:?]
        at com.starrocks.server.MetadataMgr.lambda$getDb$1(MetadataMgr.java:238) ~[starrocks-fe.jar:?]
        at java.util.Optional.map(Optional.java:265) ~[?:?]
        at com.starrocks.server.MetadataMgr.getDb(MetadataMgr.java:238) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ShowExecutor$ShowExecutorVisitor.visitShowTableStatement(ShowExecutor.java:462) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ShowExecutor$ShowExecutorVisitor.visitShowTableStatement(ShowExecutor.java:287) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.ast.ShowTableStmt.accept(ShowTableStmt.java:131) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.ast.AstVisitor.visit(AstVisitor.java:71) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ShowExecutor.execute(ShowExecutor.java:284) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.StmtExecutor.handleShow(StmtExecutor.java:1602) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:680) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:390) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:586) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:919) ~[starrocks-fe.jar:?]
        at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]

Digging into StarRocks code, I found that the nessie response from /iceberg/v1/{prefix}/namespaces/{namespace} is missing the location property (we only have owner one).

How to reproduce it

  1. Install StarRocks and Nessie
  2. Then connect StarRocks to nessie using this MySQL command: create external catalog iceberg PROPERTIES ( "type"="iceberg", "iceberg.catalog.type"="rest", "iceberg.catalog.uri"="http://lakehouse-deploy-nessie:19120/iceberg", "iceberg.catalog.warehouse"="s3://[...]", "aws.s3.access_key"="[...]", "aws.s3.secret_key"="[...]", "aws.s3.region" = "[...]", "aws.s3.enable_path_style_access"="true", "client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory" );

Nessie server type (docker/uber-jar/built from source) and version

docker

Client type (Ex: UI/Spark/pynessie ...) and version

Starrocks

Additional information

No response

snazy commented 1 month ago

Can you elaborate why this is a bug in Nessie and not in StarRocks?

sebastienPinel commented 1 month ago

Not sure if that's a Nessie or StarRocks bug. We tried with Tabular rest catalog adapter which is using a JdbcCatalog and in-memory and this property is present in this case.

dimas-b commented 1 month ago

I believe this was discussed in Nessie's Zulip chat.

Apparently running ALTER DATABASE your.db SET LOCATION 'your.location' (e.g. via Spark) should allow StarRocks to proceed.

If Spark is too much of a trouble to set up, the same property can be set via Nessie REST API, but the exact commands will be a bit more complex. Feel free to open a thread about that in Zulip and I'm sure we can figure something out.

snazy commented 3 weeks ago

@sebastienPinel can you check against Nessie 0.95.0?

sebastienPinel commented 1 week ago

@snazy I checked and the "location" property is present now. It should work with StarRocks now. I don't have time to check using StarRocks but if I go back to it, I'll keep you posted !