kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.95k stars 903 forks source link

Can we remove `ParallelRunner`? #4291

Open merelcht opened 1 week ago

merelcht commented 1 week ago

Description

Check telemetry to find out how much ParallelRunner is used and whether it would be possible to deprecate and remove it.

Context

ParellelRunner is the most problematic among our runners:

And there's already a bit of evidence it's not used that much: https://linen-slack.kedro.org/t/16663577/do-you-use-kedro-run-runner-parallerunner-to-speed-up-your-p#99abccb0-7970-4a65-8fad-85fd22681beb

astrojuanlu commented 3 days ago

We discussed that there is indeed evidence that some people at least try to use this runner, but because it's so broken, they're rarely successful or they have to battle workarounds.

We agreed to do some research to understand the underlying issues and see if the ParallelRunner is still the best solution to them.

yury-fedotov commented 1 day ago

I'm using ParallelRunner in most projects... Surprised if its unpopular, based on telemetry

astrojuanlu commented 1 day ago

Thanks @yury-fedotov, that's good to know. Does it work well for your needs? Have you been hit by any issues?