sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
61 stars 17 forks source link

[FEATURE] Soda ingest dbt artifacts from the dbt cloud #167

Closed JCZuurmond closed 2 years ago

JCZuurmond commented 2 years ago

Is your feature request related to a problem? Please describe. We would like to ingest the dbt artifacts from the dbt cloud.

Describe the solution you'd like Ability to ingest the dbt artifacts from the dbt cloud.

We can use the dbt cloud API for this. Rough logic would be:

  1. Authenticate
  2. Get the runs
  3. Filter runs for which we would like to ingest the artifacts, e.g. based on:
    • time window
    • job name
    • id The filter info should be provided by the user, we could have a default time window, though.
  4. Get the run artifacts
  5. Use the ingest dbt logic to upload the test results to the Soda cloud.

At minimum the user needs to provide the API token and the account id.

To consider: how to handle the paging of the API.

Additional context I think that the tricky part will be how to decide which runs to upload the artifacts for. If users have an advanced job structure that run sporadic - e.g. based on a trigger for when data is ingested into their warehouse - instead of on a (daily) schedule, it can become difficult to decide what artifacts to ingest. We should definitely check first if the alternative below works.

Alternative Test if a user can install soda in their job definition and add the soda ingest dbt command there. If so, then we do not need this issue per se.

bastienboutonnet commented 2 years ago

Totally a good one! And also on the roadmap.

The nice thing is that we can totally reuse the work you have done (we could consider going down the full API road, dbt has two APIs, the admin and the metdada API). The nice thing is the admin API just gives us access to the artifacts.

This means we can get that resolved pretty fast and then debate whether it might make more sense to talk to the metadata API (https://docs.getdbt.com/docs/dbt-cloud/dbt-cloud-api/metadata/metadata-overview) because maybe it's better for a variety of reasons (I don't know for now).

So I would say, let's make sure our mechanism and integration on the OSS approach is already working and we could tackle this. We'll need to get ourselves a cloud account probably to properly test this.

Thanks a lot for raising this proactively, I really like your thinking!