z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 45 forks source link

Request: `yaml refactor`: Quote columns that use SQL keywords (such as `GROUP`) #43

Closed jaredx435k2d0 closed 1 year ago

jaredx435k2d0 commented 1 year ago

Is your feature request related to a problem? Please describe. Sometimes a column in a table may have the name GROUP, ORDER, START, SCHEMA, TABLE, etc. yaml refactor doesn't quote these columns, causing issues when dbt is run

Describe the solution you'd like I'd like yaml refactor to automatically quote column names when they conflict with database keywords

Describe alternatives you've considered Manually modifying the templates with huge if x or x conditions to add quotes if needed

z3z1ma commented 1 year ago

Since yaml refactor is only mutating yaml files, why do columns require comments? The yaml is unrelated to the SQL. In the event one directly interpolates values from the YAML into their SQL, then they should apply quoting there?

I can totally be missing the point, so leaving this open and feel free to clarify :)

jaredx435k2d0 commented 1 year ago

Let's say I have the following _salesforce__sources.yml:

version: 2

sources:
  - name: salesforce
    database: FIVETRAN_DB
    schema: SALESFORCE
    tables:
      - name: group
        identifier: '"GROUP"'
        description: ''
        columns:
          - name: _FIVETRAN_SYNCED
          - name: DOES_INCLUDE_BOSSES
          - name: DOES_SEND_EMAIL_TO_MEMBERS
          - name: TYPE
          - name: RELATED_ID
          - name: DEVELOPER_NAME
          - name: CREATED_BY_ID
          - name: LAST_MODIFIED_DATE
          - name: ID
          - name: SYSTEM_MODSTAMP
          - name: CREATED_DATE
          - name: NAME
          - name: EMAIL
          - name: OWNER_ID
          - name: LAST_MODIFIED_BY_ID
          - name: _FIVETRAN_DELETED

And the following stg_salesforce__group.sql:

SELECT
    ID
,   NAME
,   DEVELOPER_NAME
,   RELATED_ID
,   TYPE
,   EMAIL
,   OWNER_ID
,   DOES_SEND_EMAIL_TO_MEMBERS
,   DOES_INCLUDE_BOSSES
,   CREATED_DATE
,   CREATED_BY_ID
,   LAST_MODIFIED_DATE
,   LAST_MODIFIED_BY_ID
,   SYSTEM_MODSTAMP
,   _FIVETRAN_SYNCED

FROM
    {{ source("salesforce", "group") }}

WHERE
    _FIVETRAN_DELETED IS DISTINCT FROM TRUE

If I remove identifier: '"GROUP"' from my yml file, I'll get the following error when I execute dbt run -s stg_salesforce__group:

Database Error in model stg_salesforce__group (models/staging/salesforce/stg_salesforce__group.sql)
  001003 (42000): SQL compilation error:
  syntax error line 20 at position 27 unexpected 'group'.
  syntax error line 22 at position 0 unexpected 'WHERE'.
  syntax error line 24 at position 8 unexpected ')'.

I tried adding additional quotes inside the source() in my SQL file, but that then looks for a source name in my yml file that is also quoted.

So maybe the best approach would be to add the dbt-approved "identifier" attribute.

Thoughts?

z3z1ma commented 1 year ago

@jaredx435k2d0 I think you may want to take a look at this: https://docs.getdbt.com/reference/project-configs/quoting

It will do exactly what you want completely independent of the YAML having extra quotes wrapped into the string or withut having to use an identifier key.

z3z1ma commented 1 year ago

One more reference for you: https://docs.getdbt.com/reference/resource-properties/quoting

If the project-level config does not work, the source specific one certainly will. I am going to close this issue for now as its unrelated to dbt-osmosis but look forward to your confirmation of success! 😄

jaredx435k2d0 commented 1 year ago

Thanks for the info! Very helpful