trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.48k stars 3.02k forks source link

Add Iceberg catalog config property to set iceberg.worker.num-threads #11920

Open osscm opened 2 years ago

osscm commented 2 years ago

related issue: https://github.com/trinodb/trino/issues/11708 comment: https://github.com/trinodb/trino/issues/11708#issuecomment-1089566562

TableScan::planFiles

executed in a shared Iceberg worker pool. It's shared among queries so high concurrency of metadata-heavy queries can make things worse. And this thread pool is not managed by Trino so its memory usage is not tracked.

so, we should allow this property to be passed from the catalog, so that it will not be set statically.

electrum commented 2 years ago

I think this needs to be fixed in the Iceberg library to allow per-catalog configuration rather than relying on a global system property. I don't see a way to fix this on the Trino side.

raunaqmorarka commented 4 months ago

There is a planWith(ExecutorService executorService) API available, which we are using in IcebergSplitSource (configurable though Make Iceberg split manager threads configurable). We could extend it's usage to all callers of planFiles to control number of threads on Trino side.