z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 45 forks source link

When inheriting upstream column annotations, provide case-insensitive configuration. #58

Closed yingyingqiqi closed 1 year ago

yingyingqiqi commented 1 year ago

When inheriting upstream column annotations, provide case-insensitive configuration. I checked to https://github.com/z3z1ma/dbt-osmosis/issues/4 , there is mention of case issue. At the time I tried it did not work.

The following are some examples: source yaml:

models/raw_umc/source.yml , which I wrote manually
sources:
  - name: raw_umc
    tables:
    - name: umc_user_device
    description: >
      A table that stores the association between users and devices.
    columns:
      - name: user_id
        data_type: string
        description: User ID, associated with the user table.
      - name: DEVICE_ID
        data_type: string
        description: device_master table incremental ID, device_master table incremental ID
      - name: updated_at
        data_type: string
        description: Last update time.

model:

select * from source('raw_umc','umc_user_device')

After running the command, successfully stg yaml : dbt-osmosis yaml document models/stg --project-dir . --profiles-dir .

models/stg/schema/user_device.yml
version: 2
models:
  - name: bau_user_device
    columns:
      - name: USER_ID
        description: ''
      - name: DEVICE_ID
        description: 'device_master table incremental ID, device_master table incremental ID'
      - name: UPDATED_AT
        description: ''
sources: []

You can see that the upper-case USER_ID and UPDATED_AT fields in stg do not inherit the raw lower-case user_id, updated_at. Note that I am using snowflake, so the returned schema is in upper case.

Possible useful logs:

INFO     🔬 Looking for actions for model.snowflake_dbt.bau_user_device                                                                                                                                                                                                                                            osmosis.py:1006
INFO     ✨ Schema file is up to date for model model.snowflake_dbt.bau_user_device                                                                                                                                                                                                                                 osmosis.py:877
INFO     🔬 Looking for actions for model.snowflake_dbt.bau_user_device                                                                                                                                                                                                                                              osmosis.py:1006
INFO     💡 Column DEVICE_ID is inheriting knowledge from the lineage of progenitor (source.snowflake_dbt.raw_umc.umc_user_device) for model model.snowflake_dbt.bau_user_device                                                                                                                                           osmosis.py:965
INFO     {'description': 'device_master table incremental ID, device_master table incremental ID'}                                                                                                                                                                                                                                                  osmosis.py:974
INFO     ✨ Schema file /workspac/snowflake_dbt/models/stg/schema/bau_user_device.yml updated                                                                                                                                                                                                                     osmosis.py:872
INFO     🔬 Looking for actions for model.snowflake_dbt.bau_user_partner      

Question: How to make the USER_ID and UPDATED_AT fields in stg uppercase, inheriting the lowercase user_id and updated_at fields in the upstream raw? Thank you🙏