kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.48k stars 874 forks source link

Add option to raise errors during `find_pipelines` #3823

Closed deepyaman closed 1 month ago

deepyaman commented 2 months ago

Description

Resolves #2910

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

deepyaman commented 2 months ago

Implementation wise looks fine. I think we should update the relevant documentation as well. I was wondering should we also update the starter to add the default to make this more discoverable.

pipelines = find_pipelines(raise_errors=False)

Good point! See 71e0f64 for documentation added in response.

deepyaman commented 1 month ago

My only question is whether we should keep the default value raise_errors=False rather than raise_errors=True. If I understood correctly, the feature was initially designed for developers, so any half-developed pipeline won't prevent another pipeline from running. However, general users find debugging hard if an issue occurs and the pipeline is still running. So we may consider making raise_errors=True the default behaviour, as the chance that developers are aware of this feature and the flag is higher than that of general users.

I'm OK with that, if that's the prevailing user feedback. @astrojuanlu @merelcht any opinion?

Also, I don't personally feel that behavior change is "breaking" (it's pretty easy to update your registry, and hopefully you're not running with a broken pipeline already in production), but would be happy to get others' view.

merelcht commented 1 month ago

My only question is whether we should keep the default value raise_errors=False rather than raise_errors=True. If I understood correctly, the feature was initially designed for developers, so any half-developed pipeline won't prevent another pipeline from running. However, general users find debugging hard if an issue occurs and the pipeline is still running. So we may consider making raise_errors=True the default behaviour, as the chance that developers are aware of this feature and the flag is higher than that of general users.

I'm OK with that, if that's the prevailing user feedback. @astrojuanlu @merelcht any opinion?

Also, I don't personally feel that behavior change is "breaking" (it's pretty easy to update your registry, and hopefully you're not running with a broken pipeline already in production), but would be happy to get others' view.

I think it's a good call to make the default raise_errors=True. We have indeed heard that users struggle with debugging, so raising errors explicitly by default will hopefully help.

astrojuanlu commented 1 month ago

Without a full understanding of https://github.com/kedro-org/kedro/issues/2401, I'd be wary of setting raise_errors=True on a micro release. We want to take a broader look at what kind of errors people find while debugging.

My recommendation would be to introduce the option but not change the behavior, so that we can take a more comprehensive look soon. That's also the safer option anyway in terms of backwards compatibility.