z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
480 stars 47 forks source link

Bug: Disabled sources are cleaned up if source location is set in dbt_project vars #154

Open kokorin opened 5 months ago

kokorin commented 5 months ago

Intro

In our project we have to support different set of sources for dev and prod envs. We found a bug with dbt-osmosis yaml refactor: disabled sources are corrupted if source file location is set in dbt_project.yml like this:

vars:
  dbt-osmosis:
    dev: "staging/_dev_sources.yml"
    prod: "staging/_prod_sources.yml"

Here is a file containing minimal DBT project reproducing behavior described above: dbt-osmosis-disabled-source-bug.zip

How to reproduce

  1. python -m venv .venv
  2. .\.venv\Scripts\activate or source .venv/bin/activate
  3. pip install -r requirements.txt
  4. dbt build
  5. dbt build --target prod
  6. check _prod_sources.yml content, it should be
version: 2
sources:
  - name: prod
    database: prod
    schema: main
    config:
      enabled: "{{ target.name == 'prod' }}"
    tables: 
      - name: users
  1. dbt-osmosis yaml refactor
  2. check _prod_sources.yml content, it now contains no tables and config is wrong:
version: 2
sources:
  - name: prod
    database: dev
    schema: prod
    tables: []
  1. check _dev_sources.yml content, it's updated correctly:
    version: 2
    sources:
    - name: dev
    database: dev
    schema: main
    config:
      enabled: "{{ target.name == 'dev' }}"
    tables:
      - name: users
        columns:
          - name: id
            description: ''
            data_type: INT
          - name: name
            description: ''
            data_type: TEXT
          - name: nickname
            description: ''
            data_type: TEXT
z3z1ma commented 5 months ago

I think you would need to replicate the dynamism in the osmosis config.

vars:
  dbt-osmosis:
    dev: "{{ 'staging/_dev_sources.yml' if target.name == 'dev' else None }}"
    prod: "{{ 'staging/_prod_sources.yml' if target.name == 'prod' else None }}"

Not 100% sure this would work but something along these lines, might need to point a gitignored file. The source synchronization is meant to both add new tables from a source but also remove stuff that no longer exists. I don't see why we couldn'e have a flag to opt out of any behavior like that though.