jonathanking / sidechainnet

An all-atom protein structure dataset for machine learning.
BSD 3-Clause "New" or "Revised" License
322 stars 36 forks source link

Can't pickle local object 'get_collate_fn.<locals>.collate_fn' #56

Closed Aditya-Tandon closed 1 year ago

Aditya-Tandon commented 1 year ago

I am getting a pickle error while trying to load the DataLoader. Works fine on colab but not on my Mac. Unsure what the cause might be. Here's a screenshot of the error:

Screenshot 2023-02-15 at 10 07 07 PM

This is the code snippet causing the error: data = [] for batch in d["train"]: data.append(batch)

jonathanking commented 1 year ago

Hi, thanks for your interest in SidechainNet!

Can you share your notebook/the offending code? I'm not sure why multiprocessing is being called.

Aditya-Tandon commented 1 year ago

Hi, Here's a screenshot of the code with the error:

Screenshot 2023-03-08 at 12 32 53 PM

Thanks for helping!

jonathanking commented 1 year ago

Hmm, I'm sorry, but I'm unable to reproduce the issue you're experiencing. Let's try to figure this out.

Aditya-Tandon commented 1 year ago

Hi, Yes, I am using sidechainnet version 0.7.6.

I checked your colab notebbok and it's definitely the same as what I am doing.

Yeah sure, here's the error I receive on running the cell along with the cell that produces the error:

Code: for i in data2['train']: break

print(type(i))

Error:

AttributeError Traceback (most recent call last) Cell 4 in () ----> 1 for i in data2['train']: 2 break 4 print(type(i))

File /opt/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py:444, in DataLoader.iter(self) 442 return self._iterator 443 else: --> 444 return self._get_iterator()

File /opt/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py:390, in DataLoader._get_iterator(self) 388 else: 389 self.check_worker_number_rationality() --> 390 return _MultiProcessingDataLoaderIter(self)

File /opt/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1077, in _MultiProcessingDataLoaderIter.init(self, loader) 1070 w.daemon = True 1071 # NB: Process.start() actually take some time as it needs to 1072 # start a process and pass the arguments over via a pipe. 1073 # Therefore, we only add a worker to self._workers list after 1074 # it started, so that we do not call .join() if program dies 1075 # before it starts, and del tries to join but will get: 1076 # AssertionError: can only join a started process. -> 1077 w.start() 1078 self._index_queues.append(index_queue) 1079 self._workers.append(w)

File /opt/anaconda3/lib/python3.9/multiprocessing/process.py:121, in BaseProcess.start(self) 118 assert not _current_process._config.get('daemon'), \ 119 'daemonic processes are not allowed to have children' 120 _cleanup() --> 121 self._popen = self._Popen(self) 122 self._sentinel = self._popen.sentinel 123 # Avoid a refcycle if the target function holds an indirect 124 # reference to the process object (see bpo-30775)

File /opt/anaconda3/lib/python3.9/multiprocessing/context.py:224, in Process._Popen(process_obj) 222 @staticmethod 223 def _Popen(process_obj): --> 224 return _default_context.get_context().Process._Popen(process_obj)

File /opt/anaconda3/lib/python3.9/multiprocessing/context.py:284, in SpawnProcess._Popen(process_obj) 281 @staticmethod 282 def _Popen(process_obj): 283 from .popen_spawn_posix import Popen --> 284 return Popen(process_obj)

File /opt/anaconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py:32, in Popen.init(self, process_obj) 30 def init(self, process_obj): 31 self._fds = [] ---> 32 super().init(process_obj)

File /opt/anaconda3/lib/python3.9/multiprocessing/popen_fork.py:19, in Popen.init(self, process_obj) 17 self.returncode = None 18 self.finalizer = None ---> 19 self._launch(process_obj)

File /opt/anaconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py:47, in Popen._launch(self, process_obj) 45 try: 46 reduction.dump(prep_data, fp) ---> 47 reduction.dump(process_obj, fp) 48 finally: 49 set_spawning_popen(None)

File /opt/anaconda3/lib/python3.9/multiprocessing/reduction.py:60, in dump(obj, file, protocol) 58 def dump(obj, file, protocol=None): 59 '''Replacement for pickle.dump() using ForkingPickler.''' ---> 60 ForkingPickler(file, protocol).dump(obj)

AttributeError: Can't pickle local object 'get_collate_fn..collate_fn'

I wonder if it might just be an issue with Mac as the same code gets executed perfectly when I run it on a linux machine with the same version of python and VS Code.

jonathanking commented 1 year ago

Oh interesting. Thanks for mentioning that. I really only develop this code on linux. I used to have continuous integration tests for Mac, but they died when Travis CI went private.

Do you have an M1 Mac? Others have reported issues with dataloaders on Apple Silicon (example). In that thread, they suggest setting num_workers to 0. You can try this in scn.load().

Edit: I have also seen discussions talking about using multiprocessing fork instead of the default, spawn.

Aditya-Tandon commented 1 year ago

Yes, I have an M1 Mac. Both setting num_workers to 0 and setting multiprocessing to fork fixes the issue.

Thanks for the help! :)