kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.48k stars 874 forks source link

Make `kedro-telemetry` a core dependency #3976

Open ElenaKhaustova opened 5 days ago

ElenaKhaustova commented 5 days ago

Description

Partly solves https://github.com/kedro-org/kedro-plugins/issues/726

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

ElenaKhaustova commented 4 days ago

Unit tests fail if kedro-telemetry is installed in the test environment. Uninstalling package solves the problem locally.

astrojuanlu commented 4 days ago

Key question from me: does this create a circular dependency? kedro depending on kedro-telemetry and viceversa

ElenaKhaustova commented 4 days ago

Key question from me: does this create a circular dependency? kedro depending on kedro-telemetry and viceversa

Yes, that's what we obviously missed 🙈

@DimedS shall we consider moving kedro-telemetry to framework instead?

DimedS commented 4 days ago

I would be happy to incorporate the kedro-telemetry code into Kedro, and I believe it should be done. However, I thought that at this stage, it would be a significant change, so I decided to start with a minor step. As a temporary solution, I don't see any major issues with this type of circular dependency; it essentially means that the code is part of one package. I don't anticipate any usage problems with this approach. Nevertheless, I would be happy to discuss whether we are ready to integrate the telemetry code into Kedro, as this is definitely the correct long-term solution.

ElenaKhaustova commented 4 days ago

As far as I understand, pip indeed can deal with circular dependencies as long as they are both not build dependencies of each other - which is our case. https://discuss.python.org/t/handling-installation-of-circular-dependencies/25531/8

But from the code organisation perspective, it feels wrong to me unless we have reasonable arguments for keeping telemetry in a separate package. Now (with recent changes of making telemetry opt-out), we expect the kedro-telemetry package to be installed anyway, regardless of whether telemetry is collected, so it doesn't reduce framework dependencies. On the other side kedro-telemetry always goes with kedro, which means they probably should live in the same code base, otherwise, we need to understand what advantages we get from keeping them separate.

@DimedS, I agree with your points on long-term solution and am curious what others think.

DimedS commented 3 days ago

I also tested by creating circular dependencies in two libraries and uploading them to PyPI. Everything worked well, with no warnings during the upload to PyPI or while installing with pip. Based on this, I believe that for the next Kedro release, it would be better to integrate kedro-telemetry as a dependency of Kedro. For the subsequent major release, we can consider integrating the telemetry directly into Kedro, possibly in conjunction with splitting Kedro into different modules.

astrojuanlu commented 2 days ago

I believe that for the next Kedro release, it would be better to integrate kedro-telemetry as a dependency of Kedro.

Fantastic, thanks 👍🏼

For the subsequent major release, we can consider integrating the telemetry directly into Kedro, possibly in conjunction with splitting Kedro into different modules.

I 100 % agree.

ElenaKhaustova commented 1 day ago

Do we have to wait for a breaking release to move the kedro-telemetry code to Kedro? Currently, telemetry is not a dependency so adding the code into Kedro isn't a breaking change.

It's not a breaking change the only thing which is blocking it is some additional code restructuring that needs to be done before which @DimedS mentioned above https://github.com/kedro-org/kedro/pull/3976#issuecomment-2203341038. Otherwise, we can proceed with moving.