z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 45 forks source link

Keep column description if meta.osmosis_keep_description is True #112

Closed syou6162 closed 8 months ago

syou6162 commented 9 months ago

Background

The --force-inheritance option is useful for updating columns in the propagator when the description of a column on the data lake side is updated. In the long-term operation of dbt-osmosis, you will often use the --force-inheritance option, but sometimes you may not want the column description to be updated with this option.

Example

Suppose you have a column named shop_id in your data lake with a column description of "ID of shop". Suppose you create a data mart about a user by referencing shop_id in this data lake. Suppose this data mart contains a column called shop_id, but you want to include a description for the column "ID of the latest shop where the user made a purchase". In this case, the --force-inheritance option will overwrite "ID of the latest shop where the user made a purchase" with just "ID of shop" 😭 .

Of course, you should always try to name the columns better in the data mart (e.g., latest_shop_id), but due to historical reasons, it may sometimes be difficult to rename columns.

What I Changed

Don't update the column description if the column's meta.osmosis_keep_description is True, even if the --force-inheritance option is used. See testing for more detailed specification.

I've modified the behavior so that the column description won't be updated if the column's meta.osmosis_keep_description is set to True, even when using the --force-inheritance option. Please refer to the tests for more detailed specifications.

syou6162 commented 9 months ago

@z3z1ma Could you review this pull request?

z3z1ma commented 8 months ago

Will review this today, sorry for delay. On brief vacation. The premise makes total sense to me.

syou6162 commented 8 months ago

@z3z1ma Please feel free to comment if you are unclear about the intent of any of the changes.

z3z1ma commented 8 months ago

Thanks for patience. This looks good. The test case too! 🙏 🏆

Very nice addition. Generally I think that a column of the same name should be defined as high up in the tree as possible and it should not change without a change of column name as well, which is therefore essentially a new node. Following this pattern makes dbt osmosis work at its DRYest and most effective. But in the long tail of ongoing usage and maintenance I see what you are saying for sure.

Using force inheritance is too aggressive if a downstream child has a proper description. So this is a happy middle ground I think. We could consider a flag that accepts. comma separated list of column_names to exclude from propagation which gives you intra-model control but I think specifying it in the yaml as you generate the description is nicer.

syou6162 commented 8 months ago

Using force inheritance is too aggressive if a downstream child has a proper description. So this is a happy middle ground I think. We could consider a flag that accepts. comma separated list of column_names to exclude from propagation which gives you intra-model control but I think specifying it in the yaml as you generate the description is nicer.

@z3z1ma Thank you very much. It would have been nice to discuss the issue beforehand, and then discuss the background and specifications. I apologize for sending the pull request suddenly and without consulting you.

Thanks also for the merge, and I look forward to the next release!