Closed dimas-b closed 1 year ago
This is an Iceberg issue, no?
Maybe, but I guess it's related to how the NessieCatalog
presents errors to Iceberg... need to debug this a bit deeper.
Edit: this is about Iceberg-side code for sure, but I filed it on the Nessie side for investigation.
Currently Nessie servers respond with a NessieReferenceConflictException
when a table is added into a non-existent Namespace
.
I do not think this is correct from the API perspective because NessieReferenceConflictException
generally represents errors caused by expected and actual commit hash / state mismatches.
The Iceberg NessieCatalog
translates those exceptions into the Iceberg CommitFailedException
, whose javadoc says Exception raised when a commit fails because of out of date metadata.
.
All-in-all current catalog behaviour is not ideal in this case. Perhaps the catalog could raise something like a ValidationException
in this case.
I have encountered the same thing when adding
@Test
public void testTableCreationWithoutNamespace() {
Assume.assumeTrue(requiresNamespaceCreate());
Assertions.assertThatThrownBy(
() ->
catalog()
.buildTable(TableIdentifier.of("non-existing-namespace", "table"), SCHEMA)
.create())
.isInstanceOf(NoSuchNamespaceException.class)
.hasMessage("Cannot create table ns.table. Namespace ns does not exist");
}
to CatalogTests in Iceberg. This then fails with https://github.com/apache/iceberg/blob/dc8efe10dbc71cc4a511219f2569a0f8c88fda94/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L201 because Nessie throws a NessieReferenceConflictException
and propagates it as a CommitFailedException
.
I think in this case Nessie should throw a NessieNamespaceNotExistsException
, which can then be mapped to Iceberg's NoSuchNamespaceException
.
@nastra your test fails in org.apache.iceberg.BaseMetastoreCatalog.BaseMetastoreCatalogTableBuilder#create
, in the catch clause
} catch (CommitFailedException ignored) {
throw new AlreadyExistsException("Table was created concurrently: %s", identifier);
}
That's a pretty broad catch. CommitFailedException
means a commit fails because of out of date metadata
, which can be a lot of things. "Already exists" is just one possible reason for a CommitFailedException
. I suspect that CommitFailedException
needs to either get subclasses or better provide a list/set/map of all the things that went wrong causing the CommitFailedException
. WDYT?
Edit: the CommitFailedException
is thrown within the rather generic org.apache.iceberg.TableOperations#commit
.
Edit2: the original cause would probably also be nice to be included in the AlreadyExistsException
in create()
and likely elsewhere as well.
With the latest Nessie 0.56.0 release, which includes #6492 and #6503, we can get more details about the individual "conflicts". If there is only one conflict, we can translate it to another exception. Tried it locally:
Spot the bug in @nastra's test case ;)
I guess that happens if you manually modify code after copying it to GH :)
I think we can close this. Objections?
I checked the latest Iceberg code with Nessie fixes and from my POV it behaves (more) reasonably now. I saw some long exception traces in spark-sql
in case of namespace validation errors, but that is probably another matter.
I'd be fine with closing this issue if @nastra agrees.
+1 on closing this
Recent Nessie servers require namespaces to exist before tables can be created in them. However, in case a table creation attempt is make in a non-existent namespace, the Iceberg-side error is quite confusing. For example:
in SQL shell:
In fact the Nessie-side error is about namespace
db2
being missing.To continue: