kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 895 forks source link

Update parallel_runner.py #3770

Closed noklam closed 6 months ago

noklam commented 6 months ago

Description

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

noklam commented 6 months ago

Find the seemingly obsoleted MemoryDataset used in ParallelRunner, remove and see CI agree.

noklam commented 6 months ago

Closed:

https://docs.python.org/3/library/multiprocessing.html#customized-managers

The complete story is that:

In a way SharedMemoryDataset is a special wrapper dataset that coupled with the MemoryDataset.

astrojuanlu commented 6 months ago

Can the branch be deleted? (Didn't want to do it myself just in case)