meltano / meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
https://meltano.com/
MIT License
1.71k stars 150 forks source link

New family of commands for catalog and profiler artifacts - 'meltano catalog' or 'meltano snapshot' or similar #3421

Open MeltyBot opened 2 years ago

MeltyBot commented 2 years ago

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3505

Originally created by @aaronsteers on 2022-05-18 15:46:29


The use case here is for users to create artifacts detailing the data structures and data profiling outputs associated with their project. Over time, Meltano could expand the ways in which this data is used.

Today, we have much of this data in .meltano internal artifacts (such as the singer catalog files) but we don't have any well defined means of working with these artifacts, we don't provide a wholistic diff/compare options, and we don't have a single place where a user could publish or search their schema definitions (for instance)

Schema catalog change detection

meltano catalog snapshot create all # Create artifacts for all taps' schema and for known dbt models' schemas
meltano catalog snapshot update all # Update artifacts
meltano catalog snapshot diff tap-gitlab --from=<old-path> --to=<new-path> # Print a diff of just the tap-gitlab artifacts

The user could presumably choose whether they want these artifacts committed to their repo or not.

Community plugin first approach

This is a big undertaking and would likely need to go through multiple iterations before a stable interface is landed on.

To allow faster iteration, this could in theory be built first as a utility plugin and published to the hub.

meltano add utility meltano-catalog-util
meltano run meltano-catalog-util:create all
meltano run meltano-catalog-util:update all
meltano run meltano-catalog-util:diff tap-gitlab
MeltyBot commented 2 years ago

View 4 previous comments from the original issue on GitLab