I used the train_dist.py instead train.py, the cmd is the same as using train.py, below is returned error:
File "train_dist.py", line 326, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 162, in spawn
process.start()
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function main_worker at 0x7f465ede04c0>: attribute lookup main_worker on main failed
Does anyone meet the same error? How to solve it? Thanks.
@zhanghang1989
I used the train_dist.py instead train.py, the cmd is the same as using train.py, below is returned error: File "train_dist.py", line 326, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 162, in spawn
process.start()
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/data2/hwang/softwares/anaconda3/envs/encoding/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function main_worker at 0x7f465ede04c0>: attribute lookup main_worker on main failed
Does anyone meet the same error? How to solve it? Thanks. @zhanghang1989