yuantn / MI-AOD

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021
https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf
Apache License 2.0
333 stars 43 forks source link

ssd training error #62

Closed abhigoku10 closed 2 years ago

abhigoku10 commented 2 years ago

@yuantn thanks for you work , i tried to replicate the results for ssd i am getting below error "2022-03-05 05:58:31,781 - mmdet - INFO - Epoch [5][1900/2000] lr: 1.000e-03, eta: 0:00:19, time: 0.195, data_time: 0.011, memory: 9797, l_det_cls: 2.3892, l_det_loc: 1.0699, l_imgcls: 0.1678, L_det: 3.6269 2022-03-05 05:58:41,529 - mmdet - INFO - Epoch [5][1950/2000] lr: 1.000e-03, eta: 0:00:09, time: 0.195, data_time: 0.011, memory: 9797, l_det_cls: 2.3126, l_det_loc: 1.0026, l_imgcls: 0.1642, L_det: 3.4793 2022-03-05 05:58:51,302 - mmdet - INFO - Epoch [5][2000/2000] lr: 1.000e-03, eta: 0:00:00, time: 0.195, data_time: 0.011, memory: 9797, l_det_cls: 2.3620, l_det_loc: 1.0524, l_imgcls: 0.1669, L_det: 3.5813 2022-03-05 05:58:51,386 - mmdet - INFO - Saving checkpoint at 5 epochs 2022-03-05 05:58:54,979 - mmdet - INFO - Start running, host: v /home///alod/MI-AOD/tools/work_dirs/MI-AOD_SSD/20220305_0525412022-03-05 05:58:54,979 - mmdet - INFO - workflow: [('train', 1)], max: 1 epochs Traceback (most recent call last): File "train.py", line 257, in main() File "train.py", line 192, in main train_detector(model, [datasets, datasets_u], [cfg, cfg_u], File "/home//zzz_ska10/alod/MI-AOD/mmdet/apis/train.py", line 122, in train_detector runner.run([data_loaders_L, data_loaders_U], cfg.workflow, cfg.total_epochs) File "/home//anaconda3/envs/miaod/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 192, in run epoch_runner([data_loaders[i], data_loaders_u[i]], kwargs) File "/home//anaconda3/envs/miaod/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 60, in train outputs = self.model.train_step(X_L, self.optimizer, kwargs) File "/home//anaconda3/envs/miaod/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 31, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/home//zzz_ska10/alod/MI-AOD/mmdet/models/detectors/base.py", line 228, in train_step losses = self(data) File "/home//anaconda3/envs/miaod/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, *kwargs) File "/home///alod/MI-AOD/mmdet/core/fp16/decorators.py", line 51, in new_func return old_func(args, kwargs) File "/home///alod/MI-AOD/mmdet/models/detectors/base.py", line 162, in forward return self.forward_train(x, img_metas, kwargs) File "/home//zzz_ska10/alod/MI-AOD/mmdet/models/detectors/single_stage.py", line 83, in forward_train losses = self.bbox_head.forward_train(x, img_metas, y_loc_img, y_cls_img, y_loc_img_ignore) File "/home//zzz_ska10/alod/MI-AOD/mmdet/models/dense_heads/base_dense_head.py", line 81, in forward_train loss = self.L_wave_min(*loss_inputs, y_loc_img_ignore=y_loc_img_ignore) File "/home//zzz_ska10/alod/MI-AOD/mmdet/models/dense_heads/ssd_head.py", line 270, in L_wave_min if np.array([y_loc_img[i].sum() for i in range(len(y_loc_img))]).sum() < 0: File "/home//anaconda3/envs/miaod/lib/python3.8/site-packages/torch/_tensor.py", line 678, in array return self.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. " can you let me knw hot to solve this error

yuantn commented 2 years ago

Thanks for your attention.

You can take a look at the TypeError, which is:

can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

So you can follow the suggestion, which is to replace

if np.array([y_loc_img[i].sum() for i in range(len(y_loc_img))]).sum() < 0

to

if np.array([y_loc_img[i].sum().cpu() for i in range(len(y_loc_img))]).sum() < 0