opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
54 stars 13 forks source link

Explore using static SQL analysis as part of CI #964

Open ravenac95 opened 5 months ago

ravenac95 commented 5 months ago

Describe the feature you'd like to request

We could possibly run static analysis of BigQuery SQL as a way to quality control the models that we get. This would also make it easy to know how/when we will need to change the schema.yml files for each of the models.

Describe the solution you'd like

Tools like ZetaSQL allow for static analysis of the sql code. We would probably want to walk the dbt model's graph and analyze each of the compiled SQL queries for each model. From there we could then compare the schema files with what the analyzed query results are as a quality check.

Describe alternatives you've considered

None, but open to possible alternatives. This could allow for us to provide quick feedback before running a pipeline to deploy to the playground.

ryscheng commented 3 months ago

CI/CD should prevent "major version" changes to mart models

A case study: This PR led to source being removed from code_metrics_by_project.

This led to a really generic build error: https://github.com/opensource-observer/oso/actions/runs/8808763868/job/24178415213 " t [ApolloError]: database query error"

The fix is to refresh the Hasura schemas and then remove source from our frontend queries https://github.com/opensource-observer/oso/pull/1284

Blameless retro, tbh when you're making complicated changes like the original PR, it's really hard to detect these schema changes. I don't think I would have noticed in a code review

ryscheng commented 3 months ago

Another alternative is to use Recce https://datarecce.io/docs/get-started-jaffle-shop/

Gonna give this a try, might be easier

ryscheng commented 2 months ago

Another option is sqlmesh https://github.com/opensource-observer/oso/issues/1456