Closed Yinhance closed 3 years ago
The total error log:
Traceback (most recent call last):
File "train.py", line 119, in <module>
main()
File "train.py", line 70, in main
data_loader, engine.state.iteration):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in __getitem__
raise ValueError('fail to read {}'.format(img_path))
ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000435257.jpg
2021-04-03 00:56:54 deepserver3 train[26932] INFO
Start training with pytorch version 1.3.1
2021-04-03 00:56:54 deepserver3 train[26929] WARNING A exception occurred during Engine initialization, give up running process
Traceback (most recent call last):
File "train.py", line 119, in <module>
main()
File "train.py", line 70, in main
data_loader, engine.state.iteration):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in __getitem__
raise ValueError('fail to read {}'.format(img_path))
ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000246589.jpg
2021-04-03 00:56:54 deepserver3 train[26930] WARNING A exception occurred during Engine initialization, give up running process
Traceback (most recent call last):
File "train.py", line 119, in <module>
main()
File "train.py", line 70, in main
data_loader, engine.state.iteration):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in __getitem__
raise ValueError('fail to read {}'.format(img_path))
ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000259761.jpg
2021-04-03 00:56:54 deepserver3 train[26932] WARNING A exception occurred during Engine initialization, give up running process
Traceback (most recent call last):
File "train.py", line 119, in <module>
main()
File "train.py", line 70, in main
data_loader, engine.state.iteration):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in __getitem__
raise ValueError('fail to read {}'.format(img_path))
ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/train2014/COCO_train2014_000000310013.jpg
Traceback (most recent call last):
File "/home/yh/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/yh/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in <module>
main()
File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/yh/anaconda3/bin/python', '-u', 'train.py', '--local_rank=3']' returned non-zero exit status 1
It seems that the JointsDataset.py failed to read images from COCO. Please check whether you provide correct image urls.
Yinhance @.***>于2021年4月3日 周六上午9:18写道:
The total error log: `Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000435257.jpg
2021-04-03 00:56:54 deepserver3 train[26932] INFO
Start training with pytorch version 1.3.1 2021-04-03 00:56:54 deepserver3 train[26929] WARNING A exception occurred during Engine initialization, give up running process Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000246589.jpg
2021-04-03 00:56:54 deepserver3 train[26930] WARNING A exception occurred during Engine initialization, give up running process Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000259761.jpg
2021-04-03 00:56:54 deepserver3 train[26932] WARNING A exception occurred during Engine initialization, give up running process Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/train2014/COCO_train2014_000000310013.jpg
Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/yh/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in main() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/yh/anaconda3/bin/python', '-u', 'train.py', '--local_rank=3']' returned non-zero exit status 1.`
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/megvii-detection/MSPN/issues/31#issuecomment-812771882, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4GMAYJOQ2BN5R55CLBULLTGZUHDANCNFSM42JUZZYA .
It seems that the JointsDataset.py failed to read images from COCO. Please check whether you provide correct image urls. Yinhance @.**>于2021年4月3日 周六上午9:18写道: … The total error log: `Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000435257.jpg 2021-04-03 00:56:54 deepserver3 train[26932] INFO Start training with pytorch version 1.3.1 2021-04-03 00:56:54 deepserver3 train[26929] WARNING A exception occurred during Engine initialization, give up running process Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000246589.jpg 2021-04-03 00:56:54 deepserver3 train[26930] WARNING A exception occurred during Engine initialization, give up running process Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/val2014/COCO_val2014_000000259761.jpg 2021-04-03 00:56:54 deepserver3 train[26932] WARNING A exception occurred during Engine initialization, give up running process Traceback (most recent call last): File "train.py", line 119, in main() File "train.py", line 70, in main data_loader, engine.state.iteration): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/MSPN/dataset/JointsDataset.py", line 73, in getitem raise ValueError('fail to read {}'.format(img_path)) ValueError: fail to read /home/yh/MSPN/dataset/COCO/images/train2014/COCO_train2014_000000310013.jpg Traceback (most recent call last): File "/home/yh/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main*", mod_spec) File "/home/yh/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in main() File "/home/yh/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/yh/anaconda3/bin/python', '-u', 'train.py', '--local_rank=3']' returned non-zero exit status 1.` — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#31 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4GMAYJOQ2BN5R55CLBULLTGZUHDANCNFSM42JUZZYA .
Got it! The subprocess.CalledProcessError isn't individual,the error before it is decisive~
When I run the command python -m torch.distributed.launch --nproc_per_node=4 train.py
How to solve this problem,thx!!!