gouline / dbt-metabase

dbt + Metabase integration
https://pypi.org/project/dbt-metabase/
MIT License
461 stars 71 forks source link

Sync metabase exposures as a list of files instead of a single file #102

Closed Startouf closed 8 months ago

Startouf commented 2 years ago

To make code exploration and review easier, would it be possible to replicate metabase file and folder hierarchy when generating exposures ? Like reusing the slug

/metabase-exposures/question/42-my-awesome-question.yml /metabase-exposures/dashboard/84-my-super-dashboard

Or maybe just the ID if to avoid too many file changes when just renaming something ?

/metabase-exposures/cards/1.yml /metabase-exposures/cards/2.yml /metabase-exposures/dashboards/1.yml

Currently all exposures are sent to a single file and it's messy

JGrubb commented 10 months ago

Hey there, same idea but starting with a very mature Metabase install with several thousand exposures, and I don't want to create a file for each. I had the same idea for borrowing the collection folder structure from Metabase, but making a exposures.yml for each collection, which at least reduces the count below 1000. The exposures structure would look like this:

metabase-exposures/foo/exposures.yml
metabase-exposures/foo/bar/exposures.yml
metabase-exposures/fin/exposures.yml
metabase-exposures/fin/baz/exposures.yml
metabase-exposures/fin/baz/bat/exposures.yml

This is sort of an in between idea of the suggestion and the current status quo. Any opinions on this approach before I start cracking at it?

gouline commented 9 months ago

There's a chance this will make it into the upcoming 1.0 release (no promises). My preference is @JGrubb's approach.

Startouf commented 9 months ago

My main goal was to make code review easier during data reviews. Github will hide diffs for large files ; and refuse to do it for even larger files if I recall correctly, this is the main thing we want to avoid.

Frequent renaming questions could lead to a lof of diff so I wasn't really convinced by the slug approach (although it makes it quite friendly during code reviews and easier to spot duplicate questions)

For an initial sync, syncing all questions on a mature metabase install would yield a lot of files, but I believe during code reviews, the diff would be more chewable.

I'm not sure of the cons of having too many files (slows down search ? have to open many different files to get your result ?).

Also since questions and dashboards can be moved anywhere, there would also be occasional renaming of files when moving files around or renaming folders.

To be honest I've always found this limitation of having a question belong to only a single folder quite an annoyance. When I have a question that is used by many different teams, I'd have liked to put in all folders of all teams likely to use this question (even Google drive stated using shortcuts) to avoid people duplicating it by mistake.

This is only my opinion, but in any case I'd be happy with any solution. I would also choose the less complex implementation code-wise, and leave adjustments to future updates..