projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
1.05k stars 132 forks source link

Misleading error reporting on the Iceberg side related to namespace checks #6381

Closed dimas-b closed 1 year ago

dimas-b commented 1 year ago

Recent Nessie servers require namespaces to exist before tables can be created in them. However, in case a table creation attempt is make in a non-existent namespace, the Iceberg-side error is quite confusing. For example:

$ bin/spark-sql \
                                            --packages \
                                           org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.2.0,org.projectnessie:nessie-spark-extensions-3.3_2.12:0.53.0 \
                                            --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions  \
                                            --conf spark.sql.catalog.nessie=org.apache.iceberg.spark.SparkCatalog \
                                            --conf spark.sql.catalog.nessie.warehouse=$PWD/data2 \
                                            --conf spark.sql.catalog.nessie.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
                                            --conf spark.sql.catalog.nessie.uri=http://localhost:19120/api/v1 \
                                            --conf spark.sql.catalog.nessie.ref=main \
                                            --conf spark.sql.catalog.nessie.cache-enabled=false \
                                           --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions

in SQL shell:

spark-sql> use nessie;
23/03/23 22:34:55 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Time taken: 1.814 seconds
spark-sql> show tables;
Time taken: 0.372 seconds
spark-sql> show databases;
Time taken: 0.048 seconds
spark-sql> CREATE TABLE db2.tbl2 (id bigint, data string);
Error in query: Table db2.tbl2 already exists

In fact the Nessie-side error is about namespace db2 being missing.

To continue:

spark-sql> create database db2;
Time taken: 0.144 seconds
spark-sql> CREATE TABLE db2.tbl2 (id bigint, data string);
Time taken: 0.188 seconds
snazy commented 1 year ago

This is an Iceberg issue, no?

dimas-b commented 1 year ago

Maybe, but I guess it's related to how the NessieCatalog presents errors to Iceberg... need to debug this a bit deeper.

Edit: this is about Iceberg-side code for sure, but I filed it on the Nessie side for investigation.

dimas-b commented 1 year ago

Currently Nessie servers respond with a NessieReferenceConflictException when a table is added into a non-existent Namespace.

I do not think this is correct from the API perspective because NessieReferenceConflictException generally represents errors caused by expected and actual commit hash / state mismatches.

The Iceberg NessieCatalog translates those exceptions into the Iceberg CommitFailedException, whose javadoc says Exception raised when a commit fails because of out of date metadata..

All-in-all current catalog behaviour is not ideal in this case. Perhaps the catalog could raise something like a ValidationException in this case.

nastra commented 1 year ago

I have encountered the same thing when adding

@Test
  public void testTableCreationWithoutNamespace() {
    Assume.assumeTrue(requiresNamespaceCreate());

    Assertions.assertThatThrownBy(
            () ->
                catalog()
                    .buildTable(TableIdentifier.of("non-existing-namespace", "table"), SCHEMA)
                    .create())
        .isInstanceOf(NoSuchNamespaceException.class)
        .hasMessage("Cannot create table ns.table. Namespace ns does not exist");
  }

to CatalogTests in Iceberg. This then fails with https://github.com/apache/iceberg/blob/dc8efe10dbc71cc4a511219f2569a0f8c88fda94/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L201 because Nessie throws a NessieReferenceConflictException and propagates it as a CommitFailedException.

I think in this case Nessie should throw a NessieNamespaceNotExistsException, which can then be mapped to Iceberg's NoSuchNamespaceException.

snazy commented 1 year ago

@nastra your test fails in org.apache.iceberg.BaseMetastoreCatalog.BaseMetastoreCatalogTableBuilder#create, in the catch clause

      } catch (CommitFailedException ignored) {
        throw new AlreadyExistsException("Table was created concurrently: %s", identifier);
      }

That's a pretty broad catch. CommitFailedException means a commit fails because of out of date metadata, which can be a lot of things. "Already exists" is just one possible reason for a CommitFailedException. I suspect that CommitFailedException needs to either get subclasses or better provide a list/set/map of all the things that went wrong causing the CommitFailedException. WDYT?

Edit: the CommitFailedException is thrown within the rather generic org.apache.iceberg.TableOperations#commit.

Edit2: the original cause would probably also be nice to be included in the AlreadyExistsException in create() and likely elsewhere as well.

snazy commented 1 year ago

With the latest Nessie 0.56.0 release, which includes #6492 and #6503, we can get more details about the individual "conflicts". If there is only one conflict, we can translate it to another exception. Tried it locally: image

Spot the bug in @nastra's test case ;)

nastra commented 1 year ago

I guess that happens if you manually modify code after copying it to GH :)

snazy commented 1 year ago

See https://github.com/apache/iceberg/pull/7283

snazy commented 1 year ago

I think we can close this. Objections?

dimas-b commented 1 year ago

I checked the latest Iceberg code with Nessie fixes and from my POV it behaves (more) reasonably now. I saw some long exception traces in spark-sql in case of namespace validation errors, but that is probably another matter.

I'd be fine with closing this issue if @nastra agrees.

nastra commented 1 year ago

+1 on closing this