yqyao / SSD_Pytorch

support different SSDs and different scale test, support refineDet.
MIT License
148 stars 51 forks source link

error about iter and multiprocessing #17

Closed haochange closed 5 years ago

haochange commented 5 years ago

Epoch:3 || epochiter: 470/485|| arm_L: 8.8112 arm_C: 0.0169|| odm_L: 6.2153 odm_C: 2.6775|| loss: 17.7210||iteration time: 0.9821 sec. ||LR: 0.00010 || eta time: 1 day, 8:41:09 Epoch:3 || epochiter: 480/485|| arm_L: 8.6689 arm_C: 0.2845|| odm_L: 7.7286 odm_C: 2.3880|| loss: 19.0699||iteration time: 1.0188 sec. ||LR: 0.00010 || eta time: 1 day, 9:54:11 Epoch:4 || epochiter: 0/485|| arm_L: 8.8543 arm_C: 0.0937|| odm_L: 7.0829 odm_C: 2.2283|| loss: 18.2593||iteration time: 0.6582 sec. ||LR: 0.00010 || eta time: 21:54:03 Traceback (most recent call last): File "train.py", line 329, in main() File "train.py", line 311, in main gamma, endepoch, cfg) File "train.py", line 90, in train for iteration, (imgs, targets, ) in enumerate(train_loader): File "/home/ubuntu/XXX/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in next idx, batch = self._get_batch() File "/home/ubuntu/XXX/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch return self.data_queue.get() File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 345, in get return _ForkingPickler.loads(res) File "/home/ubuntu/XXX/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd fd = df.detach() File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle return recvfds(s, 1)[0] File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata


batchsize=32 num_images=15493 python train.py --cfg ./configs/refine_vgg_voc_512.yaml

(arm_conf): ModuleList( (0): Conv2d(512, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): Conv2d(1024, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): Conv2d(256, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): Conv2d(256, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) Traceback (most recent call last): File "train.py", line 329, in main() File "train.py", line 311, in main gamma, endepoch, cfg) File "train.py", line 90, in train for iteration, (imgs, targets, ) in enumerate(train_loader): File "/home/ubuntu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in next idx, batch = self._get_batch() File "/home/ubuntu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch return self.data_queue.get() File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 345, in get return _ForkingPickler.loads(res) File "/home/ubuntu/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd fd = df.detach() File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle return recvfds(s, 1)[0] File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata

qq276399331 commented 5 years ago

我也曾经遇到这个错误,但是我重新又跑了次,这个错误就消失了,我不知道是不是pytroch只允许1024的输入造成的影响

haochange commented 5 years ago

只允许1024的输入造成的影响

what's your means "只允许1024的输入"? @qq276399331

qq276399331 commented 5 years ago

I think this problem is a problem with pytroch itself. You can try to run the code again once. This error should not occur. 1024 refers to the upper limit of the number of pytroch bytes.

haochange commented 5 years ago

I think this problem is a problem with pytroch itself. You can try to run the code again once. This error should not occur. 1024 refers to the upper limit of the number of pytroch bytes.

Thank you so much!