z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 45 forks source link

Mocking resources in the dbt manifest? #26

Closed louis-vines closed 1 year ago

louis-vines commented 1 year ago

I've just come across this package yesterday and it looks like it will be incredibly useful. I'm particularly interested in being able to run sql against my dbt project through the python interface. I have a question...

Once we've loaded the project into memory like so

runner = DbtProject(
    project_dir=...,
    target="prod",
)

how easy do you think it would be to mock/patch models in the runner prior to running individual sql queries against it?

If this feature was available, combined with the provided ability to run sql against a pre-compiled version of the manifest, it would be quite straightforward to extend this package to support unit testing your dbt models.

I'm quite keen to take a look at how to do this myself to be honest as it's something I've mulling over how to do for a long time now! I'm just keen to hear if this sounds doable before I dig in.

z3z1ma commented 1 year ago

@louis-vines

It would be pretty trivial. You can loop through models, lookup models (like a ref(...) call does), etc. and each model will have an attribute called raw_code in dbt-core >= 1.3 or raw_sql in < 1.3 (I have a constant which does the version tuple check so we can access the right attribute via getattr)

You can manipulate this as needed then run the model, run the SQL, etc. You can also dynamically patch in macros or override macros from Python which is also interesting in the case you wanted to replace a source or ref macro with something more complex without relying on the user having an override in their project. In this case consider the possibility of using a CTAS to extract a subset of data from a models immediate parents to essentially "pin" the input. You can then run the transformation and "pin" an output deemed correct by the user. Future unit tests can use these pinned states of data stored in some dataset/schema in the data warehouse. Thats how I envision it at least.

z3z1ma commented 1 year ago

What I am describing is very high-level but if you want to dig into the low-level stuff I am pointing at lmk otherwise have at it!

The ref/source like functions BTW are get_ref_node and get_source_node Alternatively project = DbtProject(...); for node in project.dbt.nodes:

louis-vines commented 1 year ago

Nice - I'm going to take a look over the next couple of weeks when I've got some free time. Would you be interested in adding this feature to the project if I can get it working? Shall I try putting in a WIP PR when I've got something?

z3z1ma commented 1 year ago

@louis-vines

My suggestion forward is to use https://github.com/z3z1ma/dbt-core-interface as a single file dep you can copy paste into your project and import to have no additional dependency with the benefits of the interface + server. It will also make its way to pypi eventually.