pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.49k stars 813 forks source link

Declaring _MapStyleDataset inside function makes it unpicklable #2205

Open AnthoJack opened 11 months ago

AnthoJack commented 11 months ago

🐛 Bug

Describe the bug When trying to use a Dataset that was converted to map-style using data.functional.to_map_style_dataset, I encountered the following error message:

... File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'to_map_style_dataset.._MapStyleDataset'

After some research, I found the list of what is picklable here and found that for a class to be picklable, it has to be from the top level of a module

This isn't the case for _MapStyleDataset as it is declared within the to_map_style_dataset function

The fix seems simple enough (declare _MapStyleDataset outside the function) so I would like to know if there was anything making it undesireable ? If not, I'll create a PR for it but I would like some opinions on it