nicholasyager / dbt-loom

A dbt-core plugin to weave together multi-project dbt-core deployments
The Unlicense
114 stars 18 forks source link

Discussion: Generate API DBT project instead of ingesting models parsed from manifest #37

Closed kokorin closed 7 months ago

kokorin commented 7 months ago

Current solution requires manifest.json to be present before upstream projects can be compiled. In some cases it's not convenient. Instead I think it can be useful to generate API package from existing DBT project, that package can be used as simple DBT package.

Some details of the idea:

  1. Public models in core project are exposed as sources
  2. Ephemeral models are created in core_api project which just do select * from {{ source('core', 'public_model_name') }}
  3. If public core model is versioned - several SQL files can be created to reflect model versions like select * from {{ source('core', 'public_model_name_v2') }}
  4. dbt_project.yml is copied from core to core_api with all model configurations excluded except public models and of course name changed from core to core_api.

Do you think it can work? Do you see any issues with that?

nicholasyager commented 7 months ago

Hi @kokorin 👋🏻 I'm going to diagram what I think you're proposing. Let me know if I have it right.

Step 1: Have a project! In this case, we'll call this project core. For simplicity, let's assume that only marts are public.

flowchart LR
  subgraph Core
    direction LR
    source_a --> stg_source__a --> mart_one
    source_b --> stg_source__b --> mart_one
    source_c --> stg_source__c --> mart_two
  end

Step 2: Programmatically create a shim project, called core_api, which defines ephemeral models that reference sources, where each source is a public model from core. Note that ephemeral models are always set to protected access, which still allows for referencing when installed as a package.

flowchart LR
  subgraph Core
    direction LR
    source_a --> stg_source__a --> mart_one
    source_b --> stg_source__b --> mart_one
    source_c --> stg_source__c --> mart_two
  end

  subgraph core_api
    direction LR
    mart_one -.-> core.mart_one --> core_api.mart_one[mart_one]
    mart_two -.-> core.mart_two --> core_api.mart_two[mart_two]
  end

Step 3: Import the core_api project as a package. In this case, let's call our downstream project consumer.

flowchart LR
  subgraph Core
    direction LR
    source_a --> stg_source__a --> mart_one
    source_b --> stg_source__b --> mart_one
    source_c --> stg_source__c --> mart_two
  end

  subgraph consumer
    subgraph core_api
      direction LR
      mart_one -.-> core.mart_one --> core_api.mart_one[mart_one]
      mart_two -.-> core.mart_two --> core_api.mart_two[mart_two]
    end

    core_api.mart_one --> mart_three
    core_api.mart_two --> mart_three
  end

As described above, I believe it should be possible to reference public marts from core within consumer without needing to have a manifest.json artifact.

This creates a few more challenges, of course: What information would one use to programmatically create a core_api (probably a manifest.json file 😁), where would this API be generated, when should generation of core_api be triggered and where should it reside, what documentation do we want to bring over, etc.

Let me know if you end up exploring this approach! It sounds interesting, and may be useful for people who cannot store artifacts in an accessible location, but can access other project's source code.