hansetag / iceberg-catalog

A Rust implementation of the Iceberg REST Catalog specification.
Apache License 2.0
140 stars 9 forks source link

Split Metadata Model for Tables #33

Open c-thiel opened 3 months ago

c-thiel commented 3 months ago

Currently we store TableMetadata as one blob in Postgres. We should investigate two options:

corleyma commented 2 months ago

Don't store TableMetadata in Postgres at all, but on a Object Store instead

I think supporting this option would be a big boon to adoption, because:

I would consider adopting this catalog implementation today in my work if this were supported.

c-thiel commented 2 months ago

@corleyma thanks for your Feedback! We are already today writing metadata to the location of the table (as metadata/<uuid>-.gz.metadata.json). We just store it additionally internally, which is our source of truth. This internal storage is what's bugging me mostly currently, because it's a big binary chunk in Postgres. The location of the external one is returned to the client as part of the TableMetadata.

The write happens here (conditionally, only if the client did not "stage" the table): https://github.com/hansetag/iceberg-catalog/blob/f5f4185927e6e9822cd815f1716b62ff9267de7f/crates/iceberg-catalog/src/catalog/tables.rs#L157

With the integration of other tools you are touching an interesting point: We are deliberately not writing the toplevel metadata.json in a file called like that, even if it would introduce higher compatibility with other engines. The reason is twofold:

  1. It could lead to inconsistencies on write - only the catalog can guarantee transactions on some storage systems such as S3
  2. We don't want clients to have credentials in the first place

Technically it wouldn't be a problem to enable it optionally. Is this what you are thinking about?

With our model we are following what tabular.io does. They also don't write a toplevel "metadata.json" and if you manually fiddle with metadata/<uuid>-.gz.metadata.json it has absolutely no effect because metadata stored somewhere internally is returned.