kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.48k stars 874 forks source link

Rename `_transcoding` module, deprecate old constant #3826

Closed deepyaman closed 1 month ago

deepyaman commented 2 months ago

Description

This leaves TRANSCODING_SEPARATOR in a public module; there is no expectation that it would move to a private module. Discussion in #3826.

Developer notes

Kedro-Viz will be fine, but https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/integrations/kedro/hooks.py#L15-L17 could get dropped again, if so desired. (Nobody needs to be on Kedro 0.19.4, since there will be no functional differences between that and 0.19.5.)

Alternatives considered

It's possible to rename the new _transcoding module to transcoding, but this avoids that and goes back to something very similar to what was there in 0.19.3 and below.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

noklam commented 2 months ago

I do think that the private module is leaky, the methods are private but then we expose TRANSCODING_SEPARATOR to the public module.

The only thing absolutely blocking here is the circular dependency, this PR is an alternative to it. @deepyaman make a point that it shouldn't be just 0.19 backward compatible, would we should continue to expose TRASNCODING_SEPARATOR for 0.20. I think this convince me a bit more towards moving it to a public module.

For private/public, there are different ways.

  1. Move the module to public as transcoding.py instead of _transcoding.py, hide the unnecessary thing with __all__. (I don't think it's needed because the rest of the method are private already)
ElenaKhaustova commented 2 months ago

I agree with the point of moving TRANSCODING_SEPARATOR to public, but since _strip_transcoding is also imported and used by other components, it should be public as well. So, the solution of moving from _transcoding.py to transcoding.py makes the most sense to me.

IMO, from the implementational point of view, keeping transcoding logic in a separate file looks better rather than moving it back to the pipeline and fixing circular dependency.

deepyaman commented 2 months ago

I agree with the point of moving TRANSCODING_SEPARATOR to public, but since _strip_transcoding is also imported and used by other components, it should be public as well. So, the solution of moving from _transcoding.py to transcoding.py makes the most sense to me.

IMO, from the implementational point of view, keeping transcoding logic in a separate file looks better rather than moving it back to the pipeline and fixing circular dependency.

Since there is some time to discuss this now—what will be the recommended/supported way to import TRANSCODING_SEPARATOR (and maybe strip_transcoding) going forward? Will it be:

noklam commented 2 months ago

I agree with the point of moving TRANSCODING_SEPARATOR to public, but since _strip_transcoding is also imported and used by other components, it should be public as well. So, the solution of moving from _transcoding.py to transcoding.py makes the most sense to me. IMO, from the implementational point of view, keeping transcoding logic in a separate file looks better rather than moving it back to the pipeline and fixing circular dependency.

Since there is some time to discuss this now—what will be the recommended/supported way to import TRANSCODING_SEPARATOR (and maybe strip_transcoding) going forward? Will it be:

  • from kedro.pipeline.transcoding import TRANSCODING_SEPARATOR
  • from kedro.pipeline.pipeline import TRANSCODING_SEPARATOR (I assume this won't be recommended, but code already uses it; will this be deprecated?)
  • Both?

Do both for 0.19, keep 1 only starting from 0.20 going forward. Similar to https://github.com/kedro-org/kedro/pull/1837/files#diff-c9b9e2fdad60057c915a16d9caf8c11637750cd6094585b4ad2f583df619ddac

This helps to avoid codebase diverge from main, we only need to remove the alias in 0.20

deepyaman commented 1 month ago

I agree with the point of moving TRANSCODING_SEPARATOR to public, but since _strip_transcoding is also imported and used by other components, it should be public as well. So, the solution of moving from _transcoding.py to transcoding.py makes the most sense to me. IMO, from the implementational point of view, keeping transcoding logic in a separate file looks better rather than moving it back to the pipeline and fixing circular dependency.

Since there is some time to discuss this now—what will be the recommended/supported way to import TRANSCODING_SEPARATOR (and maybe strip_transcoding) going forward? Will it be:

  • from kedro.pipeline.transcoding import TRANSCODING_SEPARATOR
  • from kedro.pipeline.pipeline import TRANSCODING_SEPARATOR (I assume this won't be recommended, but code already uses it; will this be deprecated?)
  • Both?

Do both for 0.19, keep 1 only starting from 0.20 going forward. Similar to https://github.com/kedro-org/kedro/pull/1837/files#diff-c9b9e2fdad60057c915a16d9caf8c11637750cd6094585b4ad2f583df619ddac

This helps to avoid codebase diverge from main, we only need to remove the alias in 0.20

Done!

deepyaman commented 1 month ago

However, we might consider exposing methods to apply the transcoding instead of the constant. It will guarantee that users follow the same logic when applying it rather than implementing their own based on the constant.

FWIW I actually have found access to the constant most useful in plugin development. Maybe functions to apply transcoding, split on separator, and check if something is transcoded could work, but I think it's less a user need in my eyes.