Closed kokorin closed 7 months ago
Hi @kokorin 👋🏻 I'm going to diagram what I think you're proposing. Let me know if I have it right.
Step 1: Have a project! In this case, we'll call this project core
. For simplicity, let's assume that only marts are public.
flowchart LR
subgraph Core
direction LR
source_a --> stg_source__a --> mart_one
source_b --> stg_source__b --> mart_one
source_c --> stg_source__c --> mart_two
end
Step 2: Programmatically create a shim project, called core_api
, which defines ephemeral models that reference sources, where each source is a public model from core
. Note that ephemeral models are always set to protected
access, which still allows for referencing when installed as a package.
flowchart LR
subgraph Core
direction LR
source_a --> stg_source__a --> mart_one
source_b --> stg_source__b --> mart_one
source_c --> stg_source__c --> mart_two
end
subgraph core_api
direction LR
mart_one -.-> core.mart_one --> core_api.mart_one[mart_one]
mart_two -.-> core.mart_two --> core_api.mart_two[mart_two]
end
Step 3: Import the core_api
project as a package. In this case, let's call our downstream project consumer
.
flowchart LR
subgraph Core
direction LR
source_a --> stg_source__a --> mart_one
source_b --> stg_source__b --> mart_one
source_c --> stg_source__c --> mart_two
end
subgraph consumer
subgraph core_api
direction LR
mart_one -.-> core.mart_one --> core_api.mart_one[mart_one]
mart_two -.-> core.mart_two --> core_api.mart_two[mart_two]
end
core_api.mart_one --> mart_three
core_api.mart_two --> mart_three
end
As described above, I believe it should be possible to reference public marts from core
within consumer
without needing to have a manifest.json
artifact.
This creates a few more challenges, of course: What information would one use to programmatically create a core_api (probably a manifest.json
file 😁), where would this API be generated, when should generation of core_api
be triggered and where should it reside, what documentation do we want to bring over, etc.
Let me know if you end up exploring this approach! It sounds interesting, and may be useful for people who cannot store artifacts in an accessible location, but can access other project's source code.
Current solution requires
manifest.json
to be present before upstream projects can be compiled. In some cases it's not convenient. Instead I think it can be useful to generate API package from existing DBT project, that package can be used as simple DBT package.Some details of the idea:
core
project are exposed as sourcescore_api
project which just doselect * from {{ source('core', 'public_model_name') }}
core
model is versioned - several SQL files can be created to reflect model versions likeselect * from {{ source('core', 'public_model_name_v2') }}
dbt_project.yml
is copied fromcore
tocore_api
with all model configurations excluded except public models and of course name changed fromcore
tocore_api
.Do you think it can work? Do you see any issues with that?