sodadata / soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
https://go.soda.io/core-docs
Apache License 2.0
1.92k stars 209 forks source link

Dataset filter is not supported in "for each dataset", cross and reference checks #1815

Open nikolaPlume opened 1 year ago

nikolaPlume commented 1 year ago

Dataset filter is a very useful thing, however it is not supported when using "for each dataset". In order to give you a context on why/where this will be useful, let's think about the periodically checks (weekly, daily, hourly) as a simplest example. We do not want to perform checks on the entire table every day/hour, but want to check data only for last 24 hours (+ some other filters). So filter like where cloud = 'ci' and dt > '2023-02-23' should be applied on each table from the "for each" section.

The same way of using dataset filters would be useful in reference and cross checks as well. Sometimes we want to compare datasets using the same filter (day, deployment, start_ts...) from the same data source (reference) or from a different data source such as PostgreSQL and Databricks_SQL (cross).

jmarien commented 1 year ago

SODA-1454