slidoapp / dbt-superset-lineage

Make dbt docs and Apache Superset talk to one another
MIT License
134 stars 18 forks source link

Feature request: Create datasets from dbt models if not present in Superset #42

Open rohitsanj opened 8 months ago

rohitsanj commented 8 months ago

Hi! I want to start a conversation about having this library also support a feature to create a dataset if one does not exist in Superset, perhaps as part of the push_descriptions command or an entirely new command, say, create_datasets.

Automatically creating datasets would solve for two use-cases:

  1. Ensuring any new dbt models are synced into Superset without having to explicitly create it in Superset itself.
  2. Helps sync existing dbt models into a freshly provisioned Superset instance -- again reduces effort to create the corresponding datasets in Superset (via a separate script or manual actions in the Superset UI)

This PR (https://github.com/rohitsanj/dbt-superset-lineage/pull/1) against my own fork of this repo introduces a new flag create_dataset_if_not_exists to the push_descriptions command. I've also added in a new folder called dbt_schemas containing the dbt manifest JSON schema and the schema-generated pydantic models -- this is used to parse the dbt manifest.json file to provide helpful type hints when developing and automatic data validation at runtime.

Would love to know the community's thoughts on this and if others have come across the requirement for such a feature. Thanks!