gouline / dbt-metabase

dbt + Metabase integration
https://pypi.org/project/dbt-metabase/
MIT License
448 stars 64 forks source link

Parse exposures from Metabase #22

Closed fernandobrito closed 2 years ago

fernandobrito commented 3 years ago

Moving the discussion to an issue to avoid polluting the discussion on the PR.

@z3z1ma wrote on https://github.com/gouline/dbt-metabase/pull/19 that he has a follow-up PR in mind on adding a feature to automatically create dbt exposures based on Metabase questions, making it possible to see on the dbt lineage graph which Metabase questions are using which dbt models. More details on this message as well: https://github.com/gouline/dbt-metabase/pull/19#pullrequestreview-682391872

Just today I open-sourced an internal project that does the same, but between dbt and Tableau: https://github.com/voi-oss/dbt-exposures-crawler.

One thing I would like to ask is what's @gouline scope for dbt-metabase. Do you want to keep it narrow and focus on documentation syncing, or do you want to be a comprehensive Metabase <> dbt toolkit and think that @z3z1ma feature would fit on dbt-metabase as well?

@z3z1ma, how much of the existing dbt-metabase codebase you actually use in the exposures feature? Mostly the Metabase API client I guess?

gouline commented 3 years ago

Narrow scope but not just documentation syncing. What you're describing fits fine.

What I want to avoid:

Otherwise, no objections from me.

remigabillet commented 3 years ago

Thank you @gouline for open-sourcing this work. I love this project. We're just starting a data team at AngelList Talent and I think dbt-metabase is going to be a critical part of making DBT+Metabase successful here.

I 👍 the idea of importing exposures from Metabase to DBT. It will greatly improve the DBT experience.

Since you're talking about scope, I thought I would share an idea I've been working on. Thanks for starting this conversation @fernandobrito.

Metabase Metrics (and in particular custom aggregations) are very useful to our end-users. I'm finding myself spending a lot of time in Metabase managing Metrics, the UI is very inconvenient.

I'd like to be able to define a list of Metabase Metrics on DBT models as schema properties. I wonder if this is part of this project's scope. The primary challenge is parsing the Metabase expressions which are defined as string (ex: Distinct(case([step] = "hired", [candidate_user_id])) into trees that the API can handle. I think it's easiest to maintain if I compile the JS parser from metabase source code into a small node script that can be committed and called from this project.

This would add a good amount of complexity so it's probably beyond the scope of the project. I thought I would share for the sake of discussion.

gouline commented 3 years ago

Thanks for that @remigabillet, this sounds like a separate feature to what @fernandobrito started this issue about. Please create a separate issue for discussing that, regardless of whether we decide that it's in scope for this project or not.

z3z1ma commented 3 years ago

@fernandobrito

I use the manifest parser with an additional method for extracting ref-able nodes, and the metabase client (the api method only really). I am going to wrap up any remaining open items on the other PR, and when that is resolved I will open a new one where the codebase will be introspectable. I think the code is very clean and simple and I think you'll both like it. I also think the sql parsing will be of particular interest.

z3z1ma commented 3 years ago

I have opened PR #28 associated with this.

z3z1ma commented 3 years ago

Expecting to provide usage updates to the docs and open this up for review this weekend. Will need to rebase and drive any remaining changes but its almost ready. I have some testers from the dbt community offering their time which is great and I have been consistently running this through on top of my production set up to generate/update the yml artifact documenting all the exposures to date.