z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 46 forks source link

bug: data type precision gets overridden by dbt-osmosis #157

Closed brandonpeebles closed 2 weeks ago

brandonpeebles commented 3 weeks ago

Issue:

If I set the data type precision/scale in YAML (example NUMBER(38, 0), it gets overridden back to the default NUMBER when I run dbt-osmosis.

Impact:

This makes it difficult to maintain the new requirements dbt's model contracts feature which warn you to specify this for numeric columns

Detected columns with numeric type and unspecified precision/scale, this can lead to unintended rounding: ['YEAR_OF_BIRTH']`

Their docs here: https://docs.getdbt.com/docs/collaborate/govern/model-contracts

Note that you need to specify a varchar size or numeric scale, otherwise dbt relies on default values. For example, if a numeric type defaults to a precision of 38 and a scale of 0, then the numeric column stores 0 digits to the right of the decimal (it only stores whole numbers), which might cause it to fail contract enforcement. To avoid this implicit coercion, specify your data_type with a nonzero scale, like numeric(38, 6). dbt Core 1.7 and higher provides a warning if you don't specify precision and scale when providing a numeric data type.

kokorin commented 2 weeks ago

I checked dbt-adapters and dbt-osmosis code, it looks the fix is quite simple: In DbtYamlManager.get_columns_meta it's required to replace c.dtype with c.data_type:

for c in self.adapter.get_columns_in_relation(table):
    if any(re.match(pattern, c.name) for pattern in blacklist):
        continue
    columns[self.column_casing(c.name)] = ColumnMetadata(
        name=self.column_casing(c.name),
        type=c.data_type,
        index=None,
        comment=getattr(c, "comment", None),
    )

data_type() returns precision and scale if available:

@property
def data_type(self) -> str:
    if self.is_string():
        return self.string_type(self.string_size())
    elif self.is_numeric():
        return self.numeric_type(self.dtype, self.numeric_precision, self.numeric_scale)
    else:
        return self.dtype

@classmethod
def numeric_type(cls, dtype: str, precision: Any, scale: Any) -> str:
    # This could be decimal(...), numeric(...), number(...)
    # Just use whatever was fed in here -- don't try to get too clever
    if precision is None or scale is None:
        return dtype
    else:
        return "{}({},{})".format(dtype, precision, scale)
z3z1ma commented 2 weeks ago

Fixed in 0.13.0