sodadata / soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
https://go.soda.io/core-docs
Apache License 2.0
1.92k stars 210 forks source link

Clarify batch testing strategies #2168

Closed tombaeyens closed 1 month ago

tombaeyens commented 2 months ago

Image

The reason I think we should discuss is that there is a problem in the current approach. I think in our current approach we don't provide the necessary transparency to users and in pipeline tests may leave records that are not tested. We also have to make sure that the unified concept includes all the necessary scheduleing and triggering that we need so that our CEs don't have to start building around it.

If we want a unified platform and a prescriptive approach, then we need to ensure that our vision includes each of the strategies our customers need.

This is related to the scheduleing discussion.

Background:

Also to be considered as part of this is backfilling. We know our CEs have to answer questions on this topic. How will our unified solution in the end look like when customers ask for backfilling?

tools-soda commented 2 months ago

CLOUD-8490

dirkgroenen commented 1 month ago

Moved https://sodadata.slite.com/api/s/JSEdSB5RuxcRwR/Batch-testing-strategy