sparkutils / quality

A Quality Spark DQ Library
https://sparkutils.github.io/quality/
Apache License 2.0
4 stars 2 forks source link

simplify aggregate dq result checks for individual rules #28

Closed chris-twiner closed 1 year ago

chris-twiner commented 1 year ago

currently if you first generate row level rule results then aggregate over specific rules you'd have to use sql similar to:

filter(map_values(DataQuality.ruleSetResults), ruleSet -> size(filter(map_values(ruleSet.ruleResults), result -> probability(result) > 0.3)) > 0)

which is obviously hideous, as is using flatten results to filter against.

Something similar to updateFields is needed:

ruleResult(suite, Id(set,version), Id(rule, version))

the array may not be ordered however so it ends up as a scan.