sparkutils / quality

A Quality Spark DQ Library
https://sparkutils.github.io/quality/
Apache License 2.0
4 stars 2 forks source link

DBR Optimizer Rule ignored #29

Closed chris-twiner closed 1 year ago

chris-twiner commented 1 year ago

When running with the underlying lower and higher uuid longs in the projection the optimiser rule is run with correct pushdown (also on Photon), so

select * from (select lower, higher, as_uuid(lower,higher) context from thetable) where context = ''

will optimise

select * from (select as_uuid(lower,higher) context from thetable) where context = ''

will not

chris-twiner commented 1 year ago

This seems an oddity of DBR that the predicates trigger use of the rule, instead the issue seems to be with .show / display - oss show uses the logical plan not an optimized one, assumption is display works the same way. If so the notebook performance results cannot be trusted and a collect or toLocalIterator.next must be used to measure against.

chris-twiner commented 1 year ago

Thanks to help from Sandeep @ Databricks it seems a limit is added for displays which stopped the lineage tracking, general solution added and verified on dbr 12.2