projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
1.05k stars 132 forks source link

Namespace IDs diverge when implicit namespaces are used with Spark ALTER DATABASE SQL #5433

Closed dimas-b closed 1 year ago

dimas-b commented 2 years ago

General spark-sql cmd. line (branch options differ in the steps befow):

$ bin/spark-sql \
                               --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0 \
                               --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions  \
                               --conf spark.sql.catalog.nessie=org.apache.iceberg.spark.SparkCatalog \
                               --conf spark.sql.catalog.nessie.warehouse=$PWD/data2 \
                               --conf spark.sql.catalog.nessie.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
                               --conf spark.sql.catalog.nessie.uri=http://localhost:19120/api/v1 \
                               --conf spark.sql.catalog.nessie.ref=test \
                               --conf spark.sql.catalog.nessie.cache-enabled=false \
                              --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Use case:

  1. Connect Spark to Nessie branch main
  2. spark-sql> use nessie;
  3. spark-sql> CREATE TABLE db2.t3 (id bigint, data string);
  4. Exit from Spark shell
  5. Create branch test from main using Nessie CLI
  6. Connect Spark to Nessie branch main
  7. spark-sql> ALTER DATABASE db2 SET DBPROPERTIES ('Edited-by' = 'John');
  8. Exit from Spark shell
  9. Connect Spark to Nessie branch test
  10. spark-sql> ALTER DATABASE db2 SET DBPROPERTIES ('Edited-by' = 'John');
  11. $ curl 'http://localhost:19120/api/v1/contents/db2?ref=main' results in:
    {
    "type" : "NAMESPACE",
    "id" : "31a96d9c-f6c0-434a-afff-938a09034d37",
    "elements" : [ "db2" ],
    "properties" : {
    "Edited-by" : "John"
    }
    }
  12. $ curl 'http://localhost:19120/api/v1/contents/db2?ref=test' results in:
    {
    "type" : "NAMESPACE",
    "id" : "8e98c4d3-4ddc-4428-8c78-a36893041d4c",
    "elements" : [ "db2" ],
    "properties" : {
    "Edited-by" : "John"
    }
    }

Note that the ID of Namespace db2 is different on branches main and test even though it was logically created by virtue of CREATE TABLE db2.t3 before branch test was forked from main.

snazy commented 1 year ago

Guess we can close this now, since implicit namespaces are no longer allowed?

dimas-b commented 1 year ago

Yes, closing. Cf. #6246