Currently, Trino support dynamic filter, After dynamic operator collect domain, probe side scan push the domain down to datasource, filtering some data through the index(etc. parquet/orc footer), if these domain hit ratio is not high, domain failure is possible.
Let's say have a parquet file, it has one rowgroup and col min/max is 0/100, if domain value set is 10-20, domain failure(because not filter any rowgroup), data participating in the join operation is not reduced.
In addition to broadcast Join, other dynamic filters come from coordinators and is sent to the worker along with the taskCeatetOrUpdate, have a idea is that:
After the worker receives the task, if the dynamicFilter is not empty, we immediately register it and collect it to localCollector. Then, when visitScanAndFilter, we obtain the domain and convert it to static filter, so as to filter the data participating in join operation.
Currently, Trino support dynamic filter, After dynamic operator collect domain, probe side scan push the
domain
down to datasource, filtering some data through the index(etc. parquet/orc footer), if these domain hit ratio is not high, domain failure is possible.Let's say have a parquet file, it has one
rowgroup
and colmin/max
is0/100
, if domain value set is10-20
, domain failure(because not filter any rowgroup), data participating in the join operation is not reduced.In addition to broadcast Join, other dynamic filters come from coordinators and is sent to the worker along with the taskCeatetOrUpdate, have a idea is that: After the worker receives the task, if the dynamicFilter is not empty, we immediately register it and collect it to localCollector. Then, when
visitScanAndFilter
, we obtain the domain and convert it to static filter, so as to filter the data participating in join operation.