We have an opportunity to improve visibility into documentation pipeline by introducing data quality checks. These checks should:
produce differential reports on data quality on PRs comments, similar to how schema-tools do it
feed data to S3 that is tagged with Git references so historical analysis can be done with SQL tooling
The easiest path forward is perhaps to build on existing infrastructure that emits quality reports for the examples pipeline, see for example this ticket talks about flushing this data to S3 more fully for a Metabase integration:
We have an opportunity to improve visibility into documentation pipeline by introducing data quality checks. These checks should:
The easiest path forward is perhaps to build on existing infrastructure that emits quality reports for the examples pipeline, see for example this ticket talks about flushing this data to S3 more fully for a Metabase integration:
The PR integrations could work by storing a branch-latest version of the report in GitHub actions cache and comparing against.