trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

Translate dynamic filter to compiled filter #13305

Open yikf opened 1 year ago

yikf commented 1 year ago

Currently, Trino support dynamic filter, After dynamic operator collect domain, probe side scan push the domain down to datasource, filtering some data through the index(etc. parquet/orc footer), if these domain hit ratio is not high, domain failure is possible.

Let's say have a parquet file, it has one rowgroup and col min/max is 0/100, if domain value set is 10-20, domain failure(because not filter any rowgroup), data participating in the join operation is not reduced.

In addition to broadcast Join, other dynamic filters come from coordinators and is sent to the worker along with the taskCeatetOrUpdate, have a idea is that: After the worker receives the task, if the dynamicFilter is not empty, we immediately register it and collect it to localCollector. Then, when visitScanAndFilter, we obtain the domain and convert it to static filter, so as to filter the data participating in join operation.

yikf commented 1 year ago

@findepi what about this idea?

hashhar commented 1 year ago

cc: @raunaqmorarka

raunaqmorarka commented 1 year ago

I have a WIP PR https://github.com/trinodb/trino/pull/5204 about it which I'm planning to revive and land in near future