starburstdata / dbt-trino

The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
216 stars 55 forks source link

request for enhanced properties feature #19

Open bachng2017 opened 2 years ago

bachng2017 commented 2 years ago

The properties feature the implement recently is very useful but it would be better if it could support following use-cases:

  1. combine the properties in general configuration (dbt_project.yml) and specific configuration (models). For now, if we want to set transactional=true for all models but partition just for few tables, we must also including the transactional=true in every models that we only need to change the partitions, because only one properties will be used here. What we want is set transactional=true in general level and only partition in specific models.
  2. the properties should ignores the key/value pairs if the key or value is null (or both). The use-case is that sometimes we need to control the property dynamically depend on some other variables. It is easier to use the jinja2 for key/value in this case to control whether we should include the key/value into the properties or not. For example, we have 2 env that only one needs to use the transaction. Utilize the DBT_TRANSACTION environment variable will help us to use the same code on both environments.
    +properties:
       transactional: "{{ env_var('DBT_TRANSACTION','true') }}"
findinpath commented 2 years ago

@bachng2017 thank you for the feedback. It is really nice to see that the new features come to use in the community.

Regarding the 1st point, please do describe in detail your use case with concrete code snippets. This would make it easier for us to understand what you are trying to achieve.

Same thing for the 2st point. It seems to me that the code snippet is taken out of a dbt model. What I don't quite understand is the limitation that you are speaking about. I assume that you want properties to be evaluated dynamically. I'm not sure whether this is possible in dbt , but i'll ask on dbt slack and get back with feedback.

findinpath commented 2 years ago

From dbt slack

https://getdbt.slack.com/archives/C2JRRQDTL/p1637588715264700?thread_ts=1637582164.264100&cid=C2JRRQDTL

you may make use of dbt macros for adding dynamic functionality to the config element of your dbt model:

config(partitions = [parse_partition()], ...)

or simply do this: https://docs.getdbt.com/docs/building-a-dbt-project/dont-nest-your-curlies


{{
    config(
        materialized = 'xxxxx',
        properties={
          "transactional":  var('DBT_TRANSACTION')
        }
        ...
    )
}}
bachng2017 commented 2 years ago

Let me clarify

  1. in the case we need not only transaction=true for all 10 models, but 3 models need to have partition info. What we would like to do is: set the transaction=true for all model in dbt_project.yml (a) and add parition to 1 models (b), which in total is 2 configurations.
project_path:
    + properties:
         transactional: true
     model1:  
          + properties:
                partitioned_by = ARRAY['dt']

But with the current implementation, we need to add configuration for all 10 models separately. In this case we could configure the global property as above and local property in model 1 like this, which overwrite the global

{{
    config(     
        properties={
          "transactional":  "true",
         "partitioned_by" = "ARRAY['dt']"
        }
        ...
    )
}}

But let image we have more models, more common properties, the configuration will be more redundant.

Notes that in this case partition and transaction are only samples. The point is how efficient to write the configuration for a large number of models with common properties and some few special cases.

  1. Yes, we could use var ad you've describe to achieve dynamic configuretion in dbt. But defining a transactional = true/false is not enough. The code will not run in the environment where transaction is not understood at all like this one. We need a way to "remove" the property dynamically.
    > create table test (id int) with (transactional='false');
    Query 20211206_115505_02613_jzqcx failed: Catalog 'hive' does not support table property 'transactional'

for .2 we are now have a small patch for adapters.sql as following. It works in our case but not sure for other cases

...
227 {% macro properties(properties) %}
228   {% set _properties = {} %}
229   {% for key, value in properties.items() %}
230     {% if value is defined and value != '' %}
231         {% do _properties.update({key:value}) %}
232     {% endif %}
233   {% endfor %}
234
235   {%- if _properties | length  -%}
236       WITH (
237           {%- for key, value in _properties.items() -%}
238             {{ key }} = {{ value }}
239             {%- if not loop.last -%}{{ ',\n  ' }}{%- endif -%}
240           {%- endfor -%}
241       )
242   {%- endif -%}
243 {%- endmacro -%}