I am trying to train distance_semantic_detection_motion model based on default params.yaml.
But the training process stopped after 1 epochs always.
There is no error reported, the training process did not move, all the gpu-utils are 0%.
I was using DP multi-gpu training setting, because single gpu v100 cannot fulfill batch size 22.
Problem:
epoch 0 | batch 0 | current lr 0.0001 | examples/s: 0.6 | loss: 215.47714 | time elapsed: 00h00m51s | time| CPU/GPU time: 8.8s/35.0s
epoch 0 | batch 300 | current lr 0.0001 | examples/s: 12.8 | loss: 20.83787 | time elapsed: 00h09m49s | time CPU/GPU time: 0.1s/529.1s
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 0 | Semantic IoU: 0.349
=> Saving semantic segmentation model weights with mean_iou of 0.349 at step 300 on 0 epoch.
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
=> Saving detection model weights with mean_AP of 0.010 at step 300 on 0 epoch.
=> meanAP per class in order: [0.03, 0.0, 0.0]
=> Detection val mAP 0.010
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 0 | Motion IoU: 0.524
=> Saving motion model weights with mean_iou of 0.524 at step 300 on 0 epoch.
When I CTRL+C to stop this process, I get the traceback.
^CTraceback (most recent call last):
File "./main.py", line 109, in
main()
File "./main.py", line 93, in main
model.distance_semantic_detection_motion_train()
File "/workspace/WoodScape/omnidet/train_distance_semantic_detection_motion.py", line 77, in distance_semantic_detection_motion_train
self.save_best_detection_weights()
File "/workspace/WoodScape/omnidet/train_distance_semantic_detection.py", line 134, in save_best_detection_weights
self.args.input_height])
File "/root/miniconda3/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, *kwargs)
File "/workspace/WoodScape/omnidet/train_detection.py", line 178, in detection_val
outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)
File "/workspace/WoodScape/omnidet/train_utils/detection_utils.py", line 241, in non_max_suppression
large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres
File "/workspace/WoodScape/omnidet/train_utils/detection_utils.py", line 205, in bbox_iou
b2_area = (b2_x2 - b2_x1 + 1) (b2_y2 - b2_y1 + 1)
Hi ,
I am trying to train distance_semantic_detection_motion model based on default params.yaml. But the training process stopped after 1 epochs always. There is no error reported, the training process did not move, all the gpu-utils are 0%. I was using DP multi-gpu training setting, because single gpu v100 cannot fulfill batch size 22.
Problem: epoch 0 | batch 0 | current lr 0.0001 | examples/s: 0.6 | loss: 215.47714 | time elapsed: 00h00m51s | time| CPU/GPU time: 8.8s/35.0s epoch 0 | batch 300 | current lr 0.0001 | examples/s: 12.8 | loss: 20.83787 | time elapsed: 00h09m49s | time CPU/GPU time: 0.1s/529.1s [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) epoch 0 | Semantic IoU: 0.349 => Saving semantic segmentation model weights with mean_iou of 0.349 at step 300 on 0 epoch. [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) => Saving detection model weights with mean_AP of 0.010 at step 300 on 0 epoch. => meanAP per class in order: [0.03, 0.0, 0.0] => Detection val mAP 0.010 [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) epoch 0 | Motion IoU: 0.524 => Saving motion model weights with mean_iou of 0.524 at step 300 on 0 epoch.