Open MeltyBot opened 2 years ago
I'm also interested in the idea of "Connections" from the perspective of https://github.com/meltano/meltano/issues/2549 (jupyter integration):
Adding abstraction layers builds out the actual platform effect of Meltano; If I define a connection once, every single plugin could auto-detect connections, their types and preload a dozen things. Example that come to my mind:
This would provide a significant advantage over using the stand alone versions of these tools.
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen
label, or request that it be added.
Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3353
Originally created by @kgpayne on 2022-03-28 16:40:45
Problem to solve
As a user of Meltano, I would like a place in
meltano.yml
to specifyConnection
config objects that can be referenced and reused in all of the plugins I add to my Meltano Project.Target audience
Engineers/developers using Meltano with tools that themselves have native constructs for connections, such as dbt, Airflow, Great Expectations and Superset.
Further details
Connections are a ubiquitous construct across the tools of the data platform, and yet in Meltano these details must still be expressed individually for each plugin as plain config or environment variables. This is not only extra work for users of Meltano, it also adds unnecessary human action in the integration path between plugins accessing the same resources (databases, warehouses, SaaS endpoints etc.).
For example, in the case of Snowflake:
From the above it is clear that configuration is commonly expressed in two forms, with tools supporting either or both formats (examples from Airflows connection docs):
yaml/json:
URI
Whilst it is common for tools to support the specification of a 'default' connection, the richness of User/Role/Group functionalities in databases and warehouses for controlling access and for cost management (e.g. monitoring Snowflake credit consumption based on per-tool or per-pipeline User/Role definitions) mean that most tools will define or consume several connections based on the execution context. This means a single Project will typically have many more connection definitions than base warehouse and database resources.
Interestingly we already handle connections via a
ConnectionService
inmeltano.core
(for use in the older Model functionality), however (as noted in the comments) this has too detailed an understanding of the expected format of each connection type.Proposal
Expected outcomes:
To achieve these two outcomes, we could:
discovery.yml
.default
per connection dialect. The default could be implied I guess, in cases with only one connection?env_aliases
,aliases
or the defaultvalue
field indiscovery.yml
with env var of the formCONNECTION_<dialect>_<setting name>
. E.g.:run
time. I think the easiest way to control which connection populates theCONNECTION_*
for each pluginsContext
is using plugin extras:Plugin extras can also be set at runtime using environment variables (as per the docs):
TARGET_SNOWFLAKE__CONNECTION_NAME=snowflake-dbt meltano run tap-gitlab target-snowflake dbt-snowflake:run
Also users can always revert back to configuring the individual settings manually in the plugin config or via an environment:
What does success look like, and how can we measure that?
TAP_GITLAB__CONNECTION_NAME=gitlab_dev TARGET_SNOWFLAKE__CONNECTION_NAME=snowflake_ken DBT__CONNECTION_NAME=snowflake_ken meltano run tap-gitlab target-snowflake dbt:run
(not particularly pretty, but the env vars could also go in a--environment
).discovery.yml
.connection
support still work with existing settings and environment variables, 'as-is'.Reasons Not to Build
Not particularly DRY; many connections will reference the same warehouse or database with different User/Group/Role. This might be solved with connection inheritance or resource referencing in future iterations.Connections expressed as plugins immediately support inheritance, solving the DRY issue.Regression test
(Ensure the feature doesn't cause any regressions)
Links / references