Allow to keep root level column name that contains nested fields

syou6162 commented 11 months ago

Why

I'm using dbt-osmosis to maintain the metadata for BigQuery, especially for column description. We can write column description with nested fields like

version: 2
models:
  - name: example_table
    columns:
      - name: root_column
        description: "Description of root_column"
      - name: root_column.sub1
        description: "Description of root_column.sub1"
      - name: root_column.sub2
        description: "Description of root_column.sub2"

After running dbt-osmosis yaml refactor models/example_table, the information of the root level column disappeared 😭 .

version: 2
models:
  - name: example_table
    columns:
      - name: root_column.sub1
        description: "Description of root_column.sub1"
      - name: root_column.sub2
        description: "Description of root_column.sub2"

Data users need the description of the root column.

What

I added a root level column name that contains nested fields.

Reference

https://github.com/z3z1ma/dbt-osmosis/pull/35

syou6162 commented 11 months ago

@z3z1ma Could you review this pull request?

z3z1ma commented 11 months ago

Left some comments. Reviewing from mobile. I think ripping up that ugly list comprehension is a good idea. Once comments are addressed I will merge.

syou6162 commented 11 months ago

@z3z1ma Thanks for the reply!

However, the comment you wrote from mobile does not seem to have submitted. Could you please check or re-submit your comment? I'll fix them!

z3z1ma commented 11 months ago

Hey @syou6162 I see how this works with your change. But, basically I think the code would be cleaner if we flipped the 2 lines!

for c in self.adapter.get_columns_in_relation(table):
    if hasattr(c, "flatten"):
        columns.extend([self.column_casing(exp.name) for exp in c.flatten()])
    columns.append(self.column_casing(c.name))

syou6162 commented 11 months ago

@z3z1ma Thanks for your review and comments!

Applying the code you suggested would generate, for example

version: 2
models: example_table
  - name: example_table
    columns: name: root_column.sub1
      - name: root_column.sub1
        description: "Description of root_column.sub1"
      - name: root_column.sub2
        description: "Description of root_column.sub2"
      - name: root_column
        description: "Description of root_column"

The root_column column is last, and I thought it would be better to have the root_column at the beginning for easier reading of the yaml. Based on the code you suggested, if we wanted to take into account the order of the columns, the code would look something like this

for c in self.adapter.get_columns_in_relation(table):
    tmp = []
    if hasattr(c, "flatten"):
        tmp.extend([self.column_casing(exp.name) for exp in getattr(c, "flatten", lambda: [c])()])
    tmp = [self.column_casing(c.name)] + tmp
    columns.extend(tmp)

However, this is the same behavior as https://github.com/z3z1ma/dbt-osmosis/pull/76/commits/2f6963849563c6b5bec357c852027f27aa0de833 and the commit does not use temporary variables like tmp. I think it is better to use https://github.com/z3z1ma/dbt-osmosis/pull/76/commits/2f6963849563c6b5bec357c852027f27aa0de833 from the point of view of behavior and code understanding. Let me know what you think (if you think the code using tmp is better, I will commit that code).

z3z1ma commented 11 months ago

Fair point, but how about?

for c in self.adapter.get_columns_in_relation(table):
    columns.append(self.column_casing(c.name))
    if hasattr(c, "flatten"):
        columns.extend([self.column_casing(exp.name) for exp in c.flatten()])

Ideally we just scratch the lambda thing and do this. Either way I will merge in the morning and cut a release. I appreciate your contribution! Next up I need to integrate ChatGPT :) cc: @syou6162

syou6162 commented 11 months ago

@z3z1ma Thanks, I thought your suggestion looked good so I pushed it to a011d79ad4c14dc3c6a764d01d8e02796aaea511 !

z3z1ma commented 11 months ago

LGTM 🚀

z3z1ma / dbt-osmosis

Allow to keep root level column name that contains nested fields #76

Why

What

Reference