gouline / dbt-metabase

dbt + Metabase integration
https://pypi.org/project/dbt-metabase/
MIT License
448 stars 64 forks source link

Exposure Parsing #38

Closed z3z1ma closed 2 years ago

z3z1ma commented 3 years ago

Description

Reposting because some branch renaming fudged up the PR. This is still ready for review.

Parsing exposures from Metabase automatically in relation to a dbt project. This function is huge for understanding exposures within your data model. Close the loop between model composition and tangible BI exposures of those models creating an extremely valuable company asset in the dbt docs. In the nature of this project, the aim is we can use almost the same invocation parameters users set up in CI / CD for dbt-metabase export as they can with dbt-metabase exposures. This lets you run the invocation in production to essentially keep you docs in full sync with BI.

Type of change

How Has This Been Tested?

Open to ideas but ultimately I think making a mock api from metabase running on top of a database that has been seeded and ran through with the dbt jaffle shop project would be most robust. Just a lot of work.

Test Configuration:

Checklist:

References #22

z3z1ma commented 3 years ago

@gouline

This PR is ready to rock, I particularly think you will like the unit test structure (full mock api) as well as utility suite for recreation I have put together for the mb client. But otherwise we are good.

z3z1ma commented 3 years ago

@gouline

I think we have addressed the concerns now well. the cli and program interface is greatly improved as well as programmatic through the config objs and arg groups

image

dbt-metabase models --dbt_database test --dbt_manifest_path tests/fixtures/sample_project/target/manifest.json --metabase_host localhost:3000 --metabase_user alex@... --metabase_password "..." --metabase_database unit_testing --metabase_use_http
dbt-metabase exposures --dbt_database test --dbt_manifest_path tests/fixtures/sample_project/target/manifest.json --metabase_host localhost:3000 --metabase_user alex@... --metabase_password "..." --metabase_database unit_testing --metabase_use_http

Example tested commands above show how easy it is to use one command or another. The latter command defaults the output yaml name to metabase_exposures.yml and the output path to the local dir. Both of those could be provided as args but for the sake of the example- the commands with minimal args can be used interchangeably.

erika-e commented 3 years ago

@z3z1ma I had a go at testing this with my local setup. I ran into two problems. They may be errors related to how I got setup to test, so I'm open to feedback on a better way to go about testing! I'm running metabase in docker, connected to my development schema and using a manifest.json generated from a development run. I have Python 3.9.5 in the venv I'm using for testing.

I installed this branch to the venv I've been using for testing with pip3 install git+https://github.com/z3z1ma/dbt-metabase.git@exposures

The output of pip freeze now includes dbt-metabase @ git+https://github.com/z3z1ma/dbt-metabase.git@32f409c83ae6af77963f25aba573c6b0334991ee

Initially trying to run dbt-metabase I got an error -- will leave another comment about that, since I found a workaround for testing purposes.

I can run dbt-metabase models I had to add a --schema_excludes argument to avoid errors which I didn't need previously.

I get an error when I try to run dbt-metabase exposures :

2021-07-23 11:07:08,763 - INFO - Session established successfully
Traceback (most recent call last):
  File "/Users/erikapullum/metabase-python/metabase-env/bin/dbt-metabase", line 6, in <module>
    dbtmetabase.main(sys.argv[1:])
  File "/Users/erikapullum/metabase-python/metabase-env/lib/python3.9/site-packages/dbtmetabase/__init__.py", line 345, in main
    exposures(
  File "/Users/erikapullum/metabase-python/metabase-env/lib/python3.9/site-packages/dbtmetabase/__init__.py", line 142, in exposures
    dbt_models = reader.read_models(
UnboundLocalError: local variable 'reader' referenced before assignment
erika-e commented 3 years ago
UnboundLocalError: local variable 'reader' referenced before assignment

Resolved by specifying the --dbt_manifest_path. I did not get an error message to inform me that this argument was required when running dbt-metabase exposures without it

z3z1ma commented 3 years ago

@erika-e

Thanks so much for your time so far!

Here are some minimal working examples I just ran now from CLI I personally use for testing on docker containers (I also use my branches in production containers personally being an avid stress tester)

echo dbt models manifest parser example
dbt-metabase models --dbt_database test --dbt_manifest_path tests/fixtures/sample_project/target/manifest.json --metabase_host localhost:3000 --metabase_user alex@... --metabase_password "..." --metabase_database unit_testing --metabase_use_http

echo dbt exposures yaml parser example
dbt-metabase exposures --dbt_database test --dbt_path tests/fixtures/sample_project --metabase_host localhost:3000 --metabase_user alex@... --metabase_password ... --metabase_database unit_testing --metabase_use_http

echo dbt exposures manifest parser example
dbt-metabase exposures --dbt_database test --dbt_manifest_path tests/fixtures/sample_project/target/manifest.json --metabase_host localhost:3000 --metabase_user alex@... --metabase_password "..." --metabase_database unit_testing --metabase_use_http

Can we explore why your invocation did not work or what specific flags you supplied? Definitely sounds like an assertion issue or some combination of flags that shouldve supplied you an informative alert. Schema excludes shouldnt ever be required either.

erika-e commented 3 years ago

@z3z1ma Sure thing! Invocations below. Edited to add that we're still on dbt 0.19.1 but I can upgrade and generate a manifest with 0.20 if we don't expect compatibility with 0.19.1

dbt-metabase models invocation ``` dbt-metabase models \ --dbt_manifest_path dbt_manifest_semantic_types/manifest.json \ --dbt_database our_database \ --metabase_host localhost:3000 \ --metabase_user eep@... \ --metabase_password '...' \ --metabase_database EPullum_Dev \ --metabase_use_http \ --schema DEV_EPULLUM \ --schema_excludes 'DEV_EPULLUM_STG' 'DEV_EPULLUM_EX' ``` Without the `--schema_excludes` I get errors for tables in the excluded schemas that are not found.

What is the purpose of the ** around the --dbt_manifest_path argument? That's the first difference that jumps out.

dbt-metabase exposures invocation ``` dbt-metabase exposures \ --dbt_manifest_path dbt_manifest_semantic_types/manifest.json \ --dbt_database ANALYTICS \ --metabase_host localhost:3000 \ --metabase_user eep@... \ --metabase_password '...' \ --metabase_database EPullum_Dev \ --metabase_use_http \ --schema DEV_EPULLUM ```

I am able to run dbt-metabase exposures command and I get a message out that says (for example)

2021-07-23 12:13:11,014 - INFO - Introspecting card: Web and Mobile Event Counts
2021-07-23 12:13:11,015 - INFO - Model extracted from Metabase question: DAILY_ACTIVITY

However, the depends_on section of the exposures.yml file is empty.

metabase_exposures.yml example ``` - name: Web_and_Mobile_Event_Counts description: '### Visualization: Line Event counts by day #### Metadata Metabase Id: __5__ Created On: __2021-07-23T15:48:34.42767Z__' type: analysis url: http://localhost:3000/card/5 maturity: medium owner: name: eep email: eep@... depends_on: [] ```
z3z1ma commented 3 years ago

What is the purpose of the ** around the --dbt_manifest_path argument? That's the first difference that jumps out.

@erika-e Sorry that was just a markdown fail, just meant to emphasize that exposures worked with either folder parser or manifest and did not require manifest specifically.

So -depends_on can only be empty if there is an orphaned exposure in the context of the run (ie no refable model from the dbt parser found matching metabase table name). So another good addition would be the inclusion of an orphaned exposure logging message rather than proceeding quietly.

As far as how or why it happened here, the first place I would look is to see that the question is built on your expected schema DEV_EPULLUM, seems silly but always best to start from the bottom up.

erika-e commented 3 years ago

So -depends_on can only be empty if there is an orphaned exposure in the context of the run (ie no refable model from the dbt parser found matching metabase table name). So another good addition would be the inclusion of an orphaned exposure logging message rather than proceeding quietly.

As far as how or why it happened here, the first place I would look is to see that the question is built on your expected schema DEV_EPULLUM, seems silly but always best to start from the bottom up.

@z3z1ma

Let me make sure I understand a couple of things: --schema I understood to be referring to the schema within the database where the models are being built - in this case for me DEV_EPULLUM - and this argument pertains to dbt and Metabase?

--metabase_database is the name of the database within Metabase. The help text says "Target database name as set in Metabase (typically aliased)"

The name of the database in metabase (on the settings page) is EPullum_Dev and the schema it's connected to is DEV_EPULLUM. In my test instance that's the only database the question could be pulling that table from.

Depends on is blank for all questions/dashboards. I made one that is just select * from test_table, and depends_on is also blank for it. So it does seem like some kind of invocation problem.

z3z1ma commented 3 years ago

@erika-e

Both schema and schema_excludes only restrict the scope of the dbt parser. the same follows for the includes and excludes arguments (so for example with --includes my_model appended to your typical invocation, you can propogate docs / extract exposures to an output file for a single model.

Everything is pulled from Metabase and checked against the subset of dbt models for table (model) name matches from dbt <-> Metabase.

test_table is a model in your manifest/dbt project?

erika-e commented 3 years ago

@erika-e

Both schema and schema_excludes only restrict the scope of the dbt parser. the same follows for the includes and excludes arguments (so for example with --includes my_model appended to your typical invocation, you can propogate docs / extract exposures to an output file for a single model.

Everything is pulled from Metabase and checked against the subset of dbt models for table (model) name matches from dbt <-> Metabase.

test_table is a model in your manifest/dbt project?

Yes, I changed the name from the real one to the more generic test_table but it's definitely there. I checked the manifest to be sure and it's there with the same name. Totally possible it's user error still. 😅 I'll do some more testing and debugging and see if I can pin the problem down.

z3z1ma commented 2 years ago

@gouline All changes based on excellent user usage review as well as my own usage has been implemented. Now I need to integrate any readme updates to point out the renamed flags, but functionally, I dont think there is much more to do. The unit tests are very good in displaying the function on a full mock api with dbt model project seeded generating the exact expectation of auto generated exposure documentation.

z3z1ma commented 2 years ago

@gouline

All requested changes complete. Plus a little cherry in ensuring our config objects input args match our cli flag args verbatim for additional ease of use between the two methods of utilizing the package.

z3z1ma commented 2 years ago

@gouline

All polished up. I think its as good as can be now (and super readable). Thanks for the review and suggestions.