Closed zhaishengfu closed 2 years ago
Can you try to set num_workers to 0? This should give a more informative error message.
Hello, below is my detailed test process:
When i set toth train_dataloader and mocap_dataloader num_workers to zero :
train_dataloader = torch.utils.data.DataLoader(self.train_dataset, self.cfg.TRAIN.BATCH_SIZE, shuffle=True, drop_last=True,
num_workers=self.cfg.GENERAL.NUM_WORKERS) #zsf test, what is the problem??
mocap_dataloader = torch.utils.data.DataLoader(self.mocap_dataset, self.cfg.TRAIN.NUM_TRAIN_SAMPLES * self.cfg.TRAIN.BATCH_SIZE, shuffle=True, drop_last=True, num_workers=0)
After that I find the COCO image dir for train is not correct, I changed it and It seems the train process is ok:
If I set only train_dataloader num_worker to zero and let mocap_dataloader num_worker to be 1, there will be error( I checked the mocap data dir is correct):
Epoch 0: 0%| | 0/2499 [00:00<?, ?it/sT
raceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
Traceback (most recent call last):
File "train/train_prohmr.py", line 63, in <module>
prepare(preparation_data)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
trainer.fit(model, datamodule=data_module)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 553, in fit
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
self._run(model)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 918, in _run
run_name="__mp_main__")
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
self._dispatch()
pkg_name=pkg_name, script_name=fname)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 986, in _dispatch
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
self.accelerator.start_training(self)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 92, in start_training
exec(code, run_globals)
File "C:\Users\DM\Documents\Code\avatar-pose\ProHMR\train\train_prohmr.py", line 63, in <module>
self.training_type_plugin.start_training(trainer)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 161, in start_training
trainer.fit(model, datamodule=data_module)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 553, in fit
self._results = trainer.run_stage()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 996, in run_stage
self._run(model) return self._run_train()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1045, in _run_train
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 918, in _run
self.fit_loop.run()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
self._dispatch()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 986, in _dispatch
self.advance(*args, **kwargs)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 200, in advance
epoch_output = self.epoch_loop.run(train_dataloader)self.accelerator.start_training(self)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 92, in start_training
self.advance(*args, **kwargs)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 118, in advance
self.training_type_plugin.start_training(trainer)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 161, in start_training
_, (batch, is_last) = next(dataloader_iter)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\profiler\base.py", line 104, in profile_iterable
self._results = trainer.run_stage()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 996, in run_stage
value = next(iterator)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 668, in prefetch_iterator
return self._run_train()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1045, in _run_train
last = next(it)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 589, in __next__
self.fit_loop.run()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
return self.request_next_batch(self.loader_iters)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 575, in loader_iters
self.advance(*args, **kwargs)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 200, in advance
self._loader_iters = self.create_loader_iters(self.loaders)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 633, in create_loader_iters
epoch_output = self.epoch_loop.run(train_dataloader)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 105, in apply_to_collection
self.advance(*args, **kwargs)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 118, in advance
v, dtype, function, *args, wrong_dtype=wrong_dtype, include_none=include_none, **kwargs
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 96, in apply_to_collection
_, (batch, is_last) = next(dataloader_iter)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\profiler\base.py", line 104, in profile_iterable
return function(data, *args, **kwargs)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 234, in __iter__
value = next(iterator)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 668, in prefetch_iterator
self._loader_iter = iter(self.loader)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
last = next(it)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 589, in __next__
return self._get_iterator()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return self.request_next_batch(self.loader_iters)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 575, in loader_iters
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
self._loader_iters = self.create_loader_iters(self.loaders)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 633, in create_loader_iters
w.start()
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 105, in apply_to_collection
self._popen = self._Popen(self)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
v, dtype, function, *args, wrong_dtype=wrong_dtype, include_none=include_none, **kwargs
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 96, in apply_to_collection
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return function(data, *args, **kwargs)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 234, in __iter__
return Popen(process_obj)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
self._loader_iter = iter(self.loader)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
reduction.dump(process_obj, to_child)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
return self._get_iterator()
ForkingPickler(file, protocol).dump(obj) File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
BrokenPipeError: [Errno 32] Broken pipe return _MultiProcessingDataLoaderIter(self)
File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init w.start() File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
**It seems like a multiprocess error?? (the runtime error says 'using fork to start child')**
and could you tell me what it means to set num_worker to zero? does it means single process??
Setting num_workers=0
means that it uses the main process to load data. For num_workers=n>0
n new processes are spawned to load the data. I think we can close this issue for now.
Hi, in text to image generation, I want to change to my own dataset, the format of dataset is coco format, how can I get the _cocoval.npz file of my own dataset?
hello, I want to use COCO only to train . I change the dataset.yaml and prohmr.yaml as following:
and there is error :
trainer.fit(model, datamodule=data_module)
TypeError: cannot serialize '_io.BufferedReader' object
which is caused by File "train/train_prohmr.py", line 63, inMy environment is Win10