slidoapp / dbt-superset-lineage

Make dbt docs and Apache Superset talk to one another
MIT License
134 stars 18 forks source link

Feature Update #8

Closed MarcinZegar closed 1 year ago

MarcinZegar commented 2 years ago

Problems:

  1. We wanted to pull dashboards by name and not simply by published status
  2. When pushing, we wanted to specify the database by name.
  3. When pushing, we wanted to specify the schema name in case of inter-schema table name overlap.

Solutions:

  1. Added: --superset-dashboard-name as a option for pulling. It can be daisy chained by using --superset-dashboard-name 'Name One' --superset-dashboard-name 'Name 2'.
  2. Added: --superset-db-name to push command
  3. Added: --superset-schema-name to push command

Updated the docs to reflect these changes.

MarcinZegar commented 2 years ago

@one-data-cookie @mrshu @sweco Can you review?

one-data-cookie commented 2 years ago

@MarcinZegar Thank you for taking the time to play around and make the contribution. It's lovely to see the package is useful and it's much appreciated that you invested energy into even improving it! 🙂

Before going into technicalities, could you please help us better understand what you hope to achieve when hitting problems you describe?

We wanted to pull dashboards by name and not simply by published status

What would be the use case for syncing only such a limited number of dashboards? Wouldn't you be losing the much-needed visibility into the full scope of your reporting layer?

I can imagine scenarios for filtering only certain subgroup of dashboards from your database though, be it a particular schema or even dashboards already listed in an existing exposures.yaml file. Wouldn't that be a better approach, compared to manually listing single dashboards?

When pushing, we wanted to specify the database by name.

Is there a reason why not to use superset_db_id for that? In our experience, using names instead of IDs proves to be error-prone as you can often rewrite the name but not ID.

When pushing, we wanted to specify the schema name in case of inter-schema table name overlap.

In the current implementation, the key for matching tables is schema.table rather than just table. It was chosen specifically to avoid inter-schema table-name overlap you mention. Have you encountered a case in which it didn't work for you?

Similar to the first point though, I can imagine you might want to limit the sync to certain schema for other purposes?


Let me know your thoughts. We'll be more than happy to learn more and accept improvements to the code. 🙂

one-data-cookie commented 1 year ago

@MarcinZegar I'm closing this but do let me know if you have any further thoughts. We'll be more than happy to chat!