Open findepi opened 3 years ago
Hello,
I cannot find any other issue or anything else when searching for "trino retry iceberg commit error" besides this one, so I hope it makes sense!
I have configured a Trino deployment on Kubernetes with task retries to try to speed up concurrent writes to an iceberg table from multiple queries executed at once. I see that the configuration is correct because there are files being created in the location specified for the exchange manager. I have also set the following configuration in the helm chart:
additionalConfigProperties:
- retry-policy=TASK
- retry-attempts-per-task=20
- tolerant-execution-standard-split-size=128MB
The exchange manager is also configured separately.
What I see however is that when two queries are executed at once, error EXTERNAL ERROR — ICEBERG_COMMIT_ERROR
is raised and the query stops completely.
From the documentation on fault-tolerant execution I would have expected that only the failing task is re-executed instead of the whole query showing as failure. This is a long-ish query, takes ~1 minute to execute, while the task that fails is only at the end, taking ~500ms. I was hoping that by setting up task retries Trino would instead re-execute only this last task instead of the whole query.
Is this a wrong assumption? Could you please help me understand better what is the expected behavior?
Is this related to https://iceberg.apache.org/spec/#commit-conflict-resolution-and-retry? Any indication about when this will be implemented?