Run dynamic partition pruning for split locally on a worker node for iceberg

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

https://trino.io

Apache License 2.0

10.33k stars 2.97k forks source link

Run dynamic partition pruning for split locally on a worker node for iceberg #16299

Closed raunaqmorarka closed 9 months ago

raunaqmorarka commented 1 year ago

Port hive optimization from https://github.com/trinodb/trino/pull/9869 It should help in improving utilisation of dynamic filters which arrive late

cc: @sopel39 @radek-starburst @alexjo2144

alexjo2144 commented 1 year ago

We can try this, but I'm not sure it'll have the same effect as it did for Hive. In Hive we have a queue of up to 1000 splits in memory for some amount of time, and there might be a large gap between when they are generated and when they are served to a worker. In Iceberg we don't have that queue, splits are generated on demand, so the gap between when they are generated and when they are served should be small.

sopel39 commented 1 year ago

@alexjo2144 splits get queued on worker node too

osscm commented 1 year ago

@raunaqmorarka that's interesting, I am volunteering to work on this issue. wondering is it worth to identify the slowness can be there when this feature is not there.

sopel39 commented 1 year ago

@osscm are you working on this?

osscm commented 1 year ago

@osscm are you working on this?

Hi @sopel39, yes I started on it.

raunaqmorarka commented 9 months ago

Implemented by https://github.com/trinodb/trino/pull/20212