gouline / dbt-metabase

dbt + Metabase integration
https://pypi.org/project/dbt-metabase/
MIT License
442 stars 63 forks source link

Exposure parsing output options transiency #54

Closed z3z1ma closed 9 months ago

z3z1ma commented 2 years ago

I think its very valid for someone to want to parse exposures in CI/CD taking a manifest.json as an input, modifying it, and outputting an updated manifest.json. As of today, we output a yaml which is easily readable, can be committed to source control, etc. but might not be desired by a user who just wishes to see exposures in docs without the additional yml. Lets add a cli flag to make the execution optionally manipulate the artifact instead of generating yml to be consumed by subsequent dbt command.

This paves way for my next PR which involves making the data catalogue explorable through metabase table explorer. Stay tuned for these. Will updates this issue as I go if needed.

z3z1ma commented 2 years ago
{

  "fqn": [
    "DBT_PROJECT",
    "Uptime_Progress_Bar"
  ],

  "unique_id": "exposure.DBT_PROJECT.Uptime_Progress_Bar",

  "package_name": "DBT_PROJECT",

  "root_path": "",
  "path": "",
  "original_file_path": "",

  "resource_type": "exposure",

  "depends_on": {
    "macros": [],
    "nodes": [
      "model.DBT_PROJECT.fct_core_unit_uptime_stats"
    ]
  },

  "refs": [
    [
      "fct_core_unit_uptime_stats"
    ]
  ],

  "sources": [],

  "created_at": 1629059221

}

Putting this here as an FYI to myself. These are the keys remaining to populate if directly updating artifact which can be derived with 1 additional func arg. :) namely, DBT_PROJECT

fernandobrito commented 2 years ago

Interesting that you mentioned this, @z3z1ma 😄

In my other project where I create exposures for Tableau dashboards (https://github.com/voi-oss/dbt-exposures-crawler), currently, I do write directly in the manifest.json. Here is how I do it: https://github.com/voi-oss/dbt-exposures-crawler/blob/main/src/exposurescrawler/dbt/manifest.py#L38.

Note that besides adding the exposure to exposures at the root of the manifest, there are also the fields parent_map and child_map that should be updated: https://docs.getdbt.com/reference/artifacts/manifest-json. Now I realized that in my code I'm only updating parent_map and it seems I missed updating child_map, but everything is working as expected in the generated dbt docs catalog.

It's fun because in my project I will very soon also add support for writing the exposures to files like you did here, because we will now update the docs much more often (every 30 minutes or 1 hour, as we are adding some extra data ourselves in the description of the models), and I want to decouple updating the docs from running the exposures automation (which will be solved by having the YAML files in the repository).

z3z1ma commented 2 years ago

Note that besides adding the exposure to exposures at the root of the manifest, there are also the fields parent_map and child_map that should be updated: https://docs.getdbt.com/reference/artifacts/manifest-json. Now I realized that in my code I'm only updating parent_map and it seems I missed updating child_map, but everything is working as expected in the generated dbt docs catalog.

This is what I was missing. parent_map, Thank you.

It's fun because in my project I will very soon also add support for writing the exposures to files like you did here, because we will now update the docs much more often (every 30 minutes or 1 hour, as we are adding some extra data ourselves in the description of the models), and I want to decouple updating the docs from running the exposures automation (which will be solved by having the YAML files in the repository).

This is funny, we go back and forth there ha. My main driving factor in going this direction is I have an upcoming PR which is going to embed frames in the manifest model descriptions that let us use metabase to explore the tables directly in the dbt docs.

table_dataset = base64.encode('{"name":null,"dataset_query":{"database":DATABASE_ID,"type":"query","query":{"source-table":TABLE_ID}},"display":"table","visualization_settings":{}}')
f'<iframe src="{metabase_url}/question#{table_dataset}" />' 

to put it simply. ofc the table_dataset only needs the source-table and the database

dbt docs model descriptions support HTML but you would not want the html in your YAML documentation nor would I want to touch users existing documentation to insert HTML markup. But modifying the manifest like you did will afford us to be more liberal and provide us a very cool feature in my company who have been happily using the docs and will make the site like a lightweight sort of amundsen but very useful.

z3z1ma commented 2 years ago

After some deeper exploration, it seems Metabase compiles with strict "frame-ancestors 'none'" as its resource policy. Self wouldve been good I think but thats besides the point. It essentially means you'll get this with a direct embed test. The only way around would require using the built in embed feature of metabase and a lot of fancy python/API work and Id question the security.

image

So the answer is going to be using links instead, which will work just as well IMO. We will use the same base64 encoding of the above json.

z3z1ma commented 2 years ago

Like this 🥇

image

fernandobrito commented 2 years ago

Hmm, very nice, thanks for sharing the findings!

We are currently working on some "data quality" metrics/score card so our consumers can get a better feeling how much they can trust the datasets they are using. Not open source yet, but let's see how it goes. It's basically some Python code appending a string to the description field in the manifest.json as well.

Example:

image

But I agree with you that we can only get so far with dbt docs catalog. Probably next year my team will take a better look at DataHub (which has a dbt integration to read the docs from dbt), Amundsen, etc.