sparkutils / quality

A Quality Spark DQ Library
https://sparkutils.github.io/quality/
Apache License 2.0
4 stars 2 forks source link

Handle conversion issues - retain the previous comparison on extensions #19

Closed chris-twiner closed 1 year ago

chris-twiner commented 1 year ago

Predicate pushdown may be cancelled if the underlying field is actually a string. Tests have only assumed there isn't a string there, probably requires both transform with an AND( rewritten, orig )

chris-twiner commented 1 year ago

closing until I can find a scenario this can actually be triggered.

chris-twiner commented 1 year ago

re-opened as the issue is due to the string values not being legitimate uuids or ids. It may be a reasonable expression before optimisation (e.g. a user shortcut string comparison for non equals) so the original should be kept.

"In" would not be foldable to verify all content, equals could test an immediately reject etc.

chris-twiner commented 1 year ago

a general Or wrapping (with transformUp) fails to push down, related to https://github.com/apache/spark/pull/35669 . Ignoring In wrapping using If will push down if it's a constant, but otherwise not and still at the cost of transformUp. Checking for In causes iteration failures on the batches as In can't optimise further. As such safest route is nulls on the conversion functions.