stackabletech / trino-operator

Other
46 stars 7 forks source link

Introduce CRD for Iceberg table maintanance #484

Open sbernauer opened 9 months ago

sbernauer commented 9 months ago

As a Trino Iceberg user I want to define a CR that allows me to regularly run maintenance actions on my tables.

Should

Could

One possible solution would be to create a k8s CronJob for every maintenance CR. CRD could look something like

spec:
  target:
    catalog: lakehouse
    schema: default
    table: my_table # Optional
  schedule:
    interval: 24h # using new Duration struct
    # OR
    cronExpression: XXX
  actions:
    - name: optimize
      fileSizeThreshold: 100MB # optional, otherwise let trino use it's internal default
    - name: expire_snapshots
      retentionThreshold: 7d # optional, otherwise let trino use it's internal default
    - name: remove_orphan_files
      # Document: The value for retention_threshold must be higher than or equal to iceberg.remove_orphan_files.min-retention in the catalog otherwise the procedure fails with a similar message: Retention specified (1.00d) is shorter than the minimum retention configured in the system (7.00d)
      retentionThreshold: 7d # optional, otherwise let trino use it's internal default
soenkeliebau commented 9 months ago

At the risk of killing this issue with scope-creep, we discussed having TrinoTable crds a while back that the operator would read and actually go and create the tables in Trino based on the information in there.

If that hits, I think that object should contain the information described in this issue as well, not be put into a separate crd?

soenkeliebau commented 9 months ago

Or well...thinking some more...we actually migth want to apply the same maintenance object to many tables ....

sbernauer commented 9 months ago

Legit point, we should consider this when designing the CRD. I think it should be an ADR in any case

therealslimjp commented 2 months ago

Is this still in active development & is there already a release date determined?

sbernauer commented 2 months ago

Hi @therealslimjp, sadly we did not start any work on this yet and I'm not aware of any ETA