mboudiaf / pytorch-meta-dataset

A non-official 100% PyTorch implementation of META-DATASET benchmark for few-shot classification
59 stars 9 forks source link

Exception when num_workers > 0 on Windows, works on linux #5

Open jfb54 opened 3 years ago

jfb54 commented 3 years ago

On Windows 10, if num_workers > 0, you get the following exception: Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle generator objects python-BaseException Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input python-BaseException

mboudiaf commented 3 years ago

I believe this is due to the way my datasets are instantiated. For instance, when instantiating an EpisodicDataset, it creates a list of generators at https://github.com/mboudiaf/pytorch-meta-dataset/blob/5c4e85b149cf7079789190a6326c73bcc7efd1f6/pytorch_meta_dataset/pipeline.py#L100 . The problem is that generator objects cannot be pickled, which is exactly what he seems to be doing on Windows when multiprocessing is activated (i.e num_workers > 0). I suspect the way it works is that the dataset is created on the main worker, and then pickled for other processes to load.

So the workaround would be to remove this line and find a way to create the generator in the iter function (only when needed of course) and not the init . This should be doable with a try except. Given that I do not have Windows 10, I will unfortunately be unable to reproduce this error, but I would be happy to help debug it further :)

jfb54 commented 3 years ago

Thanks for clarifying. I have a linux machine as well, so I am not blocked. I may try your suggestion.

mboudiaf commented 3 years ago

I have tried to fix the issue by implementing the initial workaround I proposed earlier. Please let me know if that fixes the issue on Windows ! Thanks in advance :)

jfb54 commented 3 years ago

Thanks for working on this. Unfortunately, there is still an issue on Windows, with num_workers > 0, there is a new error: ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'Reader.construct_class_datasets..decode_image'