Test code bug in training

mumianyuxin / M3DSSD

M3DSSD: Monocular 3D Single Stage Object Detector

MIT License

69 stars 7 forks source link

Test code bug in training #3

Open chexiaoyu opened 3 years ago

chexiaoyu commented 3 years ago

Thanks for your research! When I run python scripts/train_rpn_3d.py --config=kitti_3d_base --exp_name base, training is normal, but a bug appear in testing. My torch version is 0.4.1. It looks like a object type error, have you ever met the bug? Thank you!

Epoch:9                                                                                                                                                                                                     
acc/fg: 0.950                                                                                                                                                                                               
acc/bg: 0.997                                                                                                                                                                                               
misc/z: 0.641                                                                                                                                                                                               
misc/ry: 0.311                                                                                                                                                                                              
acc/iou: 0.856                                                                                                                                                                                              
loss/ttloss: 0.551                                                                                                                                                                                          
testing                                                                                                                                                                                                     
  0%|                                                                                                                                                                              | 0/3769 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                          
  File "scripts/train_rpn_3d.py", line 323, in <module>                                                                                                                                                     
    main(args)                                                                                                                                                                                              
  File "scripts/train_rpn_3d.py", line 288, in main                                                                                                                                                         
    iou_3d = test_kitti_3d(dataset_val, rpn_net, conf, results_path, paths.data, writer=writer)                                                                                                             
  File "/**/**/model/M3DSSD-master/lib/rpn_util.py", line 1794, in test_kitti_3d                                                                                                                  
    aboxes = im_detect_3d(im, net, rpn_conf, imobj)                                                                                                                                                         
  File "/**/**/model/M3DSSD-master/lib/rpn_util.py", line 1462, in im_detect_3d                                                                                                                   
    bbox_x3d = bbox_x3d * bbox_stds[0, 4] + bbox_means[0, 4]                                                                                                                                                
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.DoubleTensor for argument #2 'other'

chexiaoyu commented 3 years ago

I just fix the bug.Modify rpn_util.py #1444 1445 1446

anchors = torch.from_numpy(rpn_conf.anchors).cuda().float()
bbox_means = torch.from_numpy(rpn_conf.bbox_means).cuda().float()
bbox_stds = torch.from_numpy(rpn_conf.bbox_stds).cuda().float()

Moreover, #1516 sorted_inds = (-aboxes[:, 4]).argsort(), tensor doesn't support argsort(), should use torch.sort(). In the end, how to use multi-gpu to train?

mumianyuxin commented 3 years ago

Thanks for your contribution to fixing the bug. I haven't tested multi-gpu training, you can modify the code to support this feature.

revisitq commented 3 years ago

I just fix the bug.Modify rpn_util.py #1444 1445 1446
anchors = torch.from_numpy(rpn_conf.anchors).cuda().float()
bbox_means = torch.from_numpy(rpn_conf.bbox_means).cuda().float()
bbox_stds = torch.from_numpy(rpn_conf.bbox_stds).cuda().float()
Moreover, #1516 sorted_inds = (-aboxes[:, 4]).argsort(), tensor doesn't support argsort(), should use torch.sort(). In the end, how to use multi-gpu to train?

Hi! To train with multi-gpu, you should modify the code at lib/core/init_training_model to as fllow:

        if 'CUDA_VISIBLE_DEVICES' not in os.environ.keys():
            os.environ['CUDA_VISIBLE_DEVICES'] = '0'
        device_ids = [id for id in range(len(os.environ['CUDA_VISIBLE_DEVICES'].split(',')))]
        network = torch.nn.DataParallel(network, device_ids)
        network.to('cuda')

Then use CUDA_VISIBLE_DEVICES='your gpu device ids' python scripts/train_rpn_3d.py --config=config --exp_name=exp_name to train with multi-gpu