Closed Minyus closed 2 years ago
Related to https://github.com/quantumblacklabs/kedro/issues/420 @deepyaman
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Can anyone help on this issue?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @deepyaman @yetudada , could you please help on this issue?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@Minyus is this still an issue with latest Kedro version?
@lorenabalan i was able to reproduce this issue with Kedro version 0.17.4, python 3.8.10, Ubuntu 20.04 LTS
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @Minyus , thank you for reporting the issue! Unfortunately I don't think we'll add support for using CachedDataSet
with the ParallelRunner
. Making MemoryDataSet
and CachedDataSet
to work with multiprocessing will make those datasets overly complicated and we do not plan to do so in the near future. However if you fancy giving it a try, you can create a custom CachedDataSet
leveraging the newly introduced in Python 3.8 SharedMemory. Kedro still supports versions prior to 3.8 and we cannot this class unfortunately, which would make the implementation of a multiprocessing-friendly CachedDataSet
and MemoryDataSet
classes much less complicated.
In the meanwhile, we should mark this dataset as non-usable by the ParallelRunner
the same way we do for MemoryDataSet
here: https://github.com/kedro-org/kedro/blob/main/kedro/runner/parallel_runner.py#L195-L238
Closing this issue. We'll make it clear that CachedDataSet
cannot be used with the ParallelRunner
Description
Using CachedDataSet and ParallelRunner together fails.
Context
CachedDataSet and ParallelRunner are often used to make the pipeline run faster, but using both with Kedro 0.17.0 fails.
Steps to Reproduce
kedro new --starter=pandas-iris
and generate the Kedro project.conf/base/catalog.yml
.kedro run -p
Expected Result
Complete without error.
Actual Result
Tell us what happens instead.
Your Environment
pip show kedro
orkedro -V
): 0.17.0python -V
): 3.7.7