Rethink how Kedro can play a role in multiprocessing / performance boost

Description

ParallelRunner allows users to run their program with multiprocessing with one extra argument and no extra code. In reality, this is rarely used. Should we continue developing it or lower the priority? We have many discussion about Runner, so the issue is created for facilitating and documentation mainly.

I dump a question in slack recently to see how the community thinks about it: https://linen-slack.kedro.org/t/16663577/do-you-use-kedro-run-runner-parallerunner-to-speed-up-your-p#99abccb0-7970-4a65-8fad-85fd22681beb

The ecosystem has evolved and solving the multiprocessing in their own way (I think pandas is still lagging behind, but polars kinda solved it)

ParallelRunner solve a subset of multiprocessing, to solve this realistically, user will need finer grain control and the current - ParallelRunner fails to do it. i.e. GPU specific workflow cannot be multi-process, you want the GPU training happen on one specific process while other process handle other non-GPU node (this sounds a bit familiar to the "group node" deployment problem but affect local development too)
- 3094

On the other hand:

async / CacheDataset` or kedro-accelerator seems to be a more practical way to speed up Kedro. I am not very up to date about async myself, maybe it's worth to put more effort on these instead of fixing ParallelRunner

Developement:

We had some discussion of using async to rewrite the Runners before to simplify the codebase. It's unclear yet what extra benefit do we get since we haven't discussed in details.
New Runner? #2716 , we have a new runner created last year but haven't merged it back to kedro, it can be installed in PyPi https://pypi.org/project/kedro-softfail-runner/.

kedro-org / kedro

Rethink how Kedro can play a role in multiprocessing / performance boost #3713

Description

3094