kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 879 forks source link

Add documentation about correct usage of `configure_project` #3707

Closed noklam closed 2 weeks ago

noklam commented 4 months ago

Description

Fix #3704. This is tricky to resolve just with the error message. If users are using Kedro with multiprocessing themselves, they need to handle logging and configure_project carefully. There are no public API that we provide for user.

Review Notes

The issue is discovered when we try to fix running kedro-viz with ParallelRunner. At the end we didn't fix the issue in kedro-viz, as it turns out to be quite tricky to implement hook that works with ParallelRunner. We don't have an working example so I didn't include anything in this PR.

The change adds explanation of the usage of configure_project, and try to give a pointer when people see this mysterious error message.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

astrojuanlu commented 2 months ago

@noklam Do you intend to keep working on this? Otherwise we can close the PR and properly groom the issue at some other time

noklam commented 2 weeks ago

@astrojuanlu I fixed the PR and is ready to be reviewed.

noklam commented 2 weeks ago

...However, is there a chance we open an issue about the underlying problem? Maybe there's something we can do at the API level to make this easier to get right the first time.

@astrojuanlu I don't know exactly what the issue will be, we have some issues and discussion opened that is related to this problem. https://github.com/kedro-org/kedro-viz/issues/1801#issuecomment-2035086525

The most concrete problem I had so far is "How to write a stateful hook that can run with ParallelRunner".

astrojuanlu commented 2 weeks ago

What I didn't fully understand is why configure_project or bootstrap_project break (or behave differently) under the ParallelRunner