Improve the method to retrieve progenitor

z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools

https://z3z1ma.github.io/dbt-osmosis/

Apache License 2.0

422 stars 45 forks source link

Improve the method to retrieve progenitor #89

Closed syou6162 closed 9 months ago

syou6162 commented 9 months ago

https://github.com/z3z1ma/dbt-osmosis/pull/83 added progenitor information to meta, and I found that progenitor (and column description) is wrong in some cases.

A typical case is the following.

graph LR;
  A.myColumn --> B.my_column;
  B.my_column --> C.my_column;

From the above figure, the true progenitor of C.my_column(snake_case) is A.myColumn(camelCase). But current implementation does not care about this kind of column lineage, so it outputs B.my_column as the progenitor of C.my_column.

I have improved the method to get the progenitor.

syou6162 commented 9 months ago

@z3z1ma By the way, I would like to see a test of dbt-osmosis to make sure I am not unintentionally breaking existing behavior, as well as this improvement and the pull requests I recently sent. I use dbt-osmosis for BigQuery, and I'm always sending pull requests with a sense of dread to see if the behavior in Snowflake has gone awry 😅.

I know there are some functions that are hard to test, but I think staticmethod functions are relatively easy to write tests for and can be tested on GitHub actions. If you would like, I will send another pull requests in the future to add these types of tests, but I would like to hear your thoughts.

z3z1ma commented 9 months ago

Yeah 100% Its quite the hill to climb, the tool was originally composed in such a way that we wanted as little code as possible, it would perform its function and be feature complete until internal dbt-core changes eventually break it. It was tested functionally on many dbt projects including a huge internal one pretty much constantly -- which is what kept it stable with all my tinkering. Now my attention is split on some new data engineering efforts, so tests here would be a boon in order to move faster on interesting features. Its the only thing I think that plays with dbt internals and the Graph in this way. cc @syou6162

syou6162 commented 9 months ago

@z3z1ma OK, I'll add more tests after this pull request is merged!

syou6162 commented 9 months ago

@z3z1ma Thank you for your advices! I have corrected the points you pointed out.