open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.43k stars 9.43k forks source link

Train with Negative Dataset #2399

Closed mdv3101 closed 4 years ago

mdv3101 commented 4 years ago

I am trying to train Cascade Mask RCNN with negative dataset (images which contain only background). When i am loading the data, i have noticed that all the negative images are skipped. It is training only with positive images. Also for COCO dataset format, should i leave bounding box empty or with [0,0,0,0] for images having no object in it? The same thing goes for annotations, should i make an entry with area=0 or just don't include annotations for images with no bounding box? Can you please tell the solution?

xvjiarui commented 4 years ago

Hi @mdv3101 You may try to set filter_empty_gt=False in dataset.

mdv3101 commented 4 years ago

Hi @xvjiarui Can you please suggest the correct format for negative data in json files (I am using Coco dataset format). What should be the values of bbox ([0,0,0,0] or empty). And what about annotation? Should we make annotation with '0' values for negative images,or make no entry at all?

After setting filter_empty_gt, i am getting the following error.

2020-04-06 12:45:38,006 - mmdet - INFO - Start running, host: madhav3101@gnode53, work_dir: /ssd_scratch/cvit/madhav/train_dataset/coco/logs
2020-04-06 12:45:38,006 - mmdet - INFO - workflow: [('train', 1)], max: 200 epochs
loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 141, in <module>
    main()
  File "tools/train.py", line 137, in main
    meta=meta)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 111, in train_detector
    meta=meta)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 242, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 359, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 263, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 75, in batch_processor
    losses = model(**data)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/detectors/base.py", line 147, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 286, in forward_train
    mask_pred = mask_head(mask_feats)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/mask_heads/fcn_mask_head.py", line 116, in forward
    x = self.upsample(x)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 778, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
therajmaurya commented 4 years ago

Facing the same issue. Can anyone give some solution?

mdv3101 commented 4 years ago

Hi @xvjiarui I have checked that code is working fine for mask rcnn. But for cascade RCNN it is showing error for negative images (as i mentioned earlier). Can you tell me what changes should i make?

yhcao6 commented 4 years ago

Hi @xvjiarui Can you please suggest the correct format for negative data in json files (I am using Coco dataset format). What should be the values of bbox ([0,0,0,0] or empty). And what about annotation? Should we make annotation with '0' values for negative images,or make no entry at all?

After setting filter_empty_gt, i am getting the following error.

2020-04-06 12:45:38,006 - mmdet - INFO - Start running, host: madhav3101@gnode53, work_dir: /ssd_scratch/cvit/madhav/train_dataset/coco/logs
2020-04-06 12:45:38,006 - mmdet - INFO - workflow: [('train', 1)], max: 200 epochs
loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 141, in <module>
    main()
  File "tools/train.py", line 137, in main
    meta=meta)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 111, in train_detector
    meta=meta)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 242, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 359, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 263, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 75, in batch_processor
    losses = model(**data)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/detectors/base.py", line 147, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 286, in forward_train
    mask_pred = mask_head(mask_feats)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/mask_heads/fcn_mask_head.py", line 116, in forward
    x = self.upsample(x)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 778, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM

For example, the image input is of shape (3, 1088, 800), if it contains no instances, then gt_bboxes: tensor([], device='cuda:0', size=(0, 4)) gt_masks: [array([], shape=(0, 1088, 800), dtype=uint8)

Mask heads are trained only on positive ROIs. If there are no gt instances, there will be no positive RoIs, the cropped feature will be a zero-dimension tensor, conv on a zero dimension tensor will report the above error.

Pr #2280 is to resolve this problem. You can refer to it.

mdv3101 commented 4 years ago

Hi @yhcao6, As suggested, i have made changes in accordance with Pr #2280 But it is still showing error for cascade RCNN.

Also when I put a check in cascade_rcnn.py file, At line 285

               if mask_feats.shape[0] > 0:
                    mask_head = self.mask_head[i]
                    mask_pred = mask_head(mask_feats)
                    mask_targets = mask_head.get_target(sampling_results, gt_masks,
                                                        rcnn_train_cfg)
                    pos_labels = torch.cat(
                        [res.pos_gt_labels for res in sampling_results])
                    loss_mask = mask_head.loss(mask_pred, mask_targets, pos_labels)
                    for name, value in loss_mask.items():
                        losses['s{}.{}'.format(i, name)] = (
                            value * lw if 'loss' in name else value)

The code starts running and I trained a model. When I run test.py, to generate the results, the output clearly suggest that negative samples have not been incorporated during the training phase. Can you please comment on this?

mdv3101 commented 4 years ago

The latest change made in #2280 has removed the errors. But the results are still poor. I have trained a similar Cascade Mask RCNN model with TensorFlow implementation (Tensorpack). After closely comparing the results, it is clear that the model is not incorporating the Negative Dataset during training phase. Kindly help.

yhcao6 commented 4 years ago

The latest change made in #2280 has removed the errors. But the results are still poor. I have trained a similar Cascade Mask RCNN model with TensorFlow implementation (Tensorpack). After closely comparing the results, it is clear that the model is not incorporating the Negative Dataset during training phase. Kindly help.

@mdv3101 Thanks for reporting the potential bug. #2280 supports the forwarding of empty tensor, but not aimed to resolve the logic of training samples without instances. Could you give more details or point out where the current implementation is wrong? For the detailed logic of training with zero-truth samples, please refer to https://github.com/open-mmlab/mmdetection/pull/1531 .

mdv3101 commented 4 years ago

@yhcao6 I am not quite sure where the implementation is wrong. For that i have to go through the entire code. It will take me a while. But there is chance that Deformable Convolution might have some issue (I am using Cascade Mask RCNN(having FPN) with Deformable Convolution). In #1531 ,it is clearly suggested that the issue has been resolved for Cascade RCNN, so maybe it has to do something with Deformable Convolution. I am looking into the code, but in the mean time if you can also go through it, it would be very helpful.

mdv3101 commented 4 years ago

I have been working on training a Cascade Mask RCNN (with FPN) model on a single class. My model is working fine, but it is detecting a certain type of false positives. These 'false positives' objects have same characteristics. I have tried to introduce them in my training dataset as Negative samples. After doing training on this new dataset (having Negative Dataset), the precision has improved, but even then these false positive doesn't vanishes completely. Can you suggest something, which hyper-parameter should i tune to train model more aggressively on negative dataset.

yhcao6 commented 4 years ago

Maybe just increase the loss weight of these negative samples? I am not sure though.

mdv3101 commented 4 years ago

Hi @mdv3101 You may try to set filter_empty_gt=False in dataset.

@xvjiarui In the current version, when we are using custom dataset, the model is filtering out images with no annotations here. Is there any particular reason, or should we create a check before filtering out the images?

JosonChan1998 commented 4 years ago

@mdv3101 Hi ! would please tell the coco format about negative example more detailly?

wingskh commented 4 years ago

Hi @xvjiarui Can you please suggest the correct format for negative data in json files (I am using Coco dataset format). What should be the values of bbox ([0,0,0,0] or empty). And what about annotation? Should we make annotation with '0' values for negative images,or make no entry at all?

After setting filter_empty_gt, i am getting the following error.

2020-04-06 12:45:38,006 - mmdet - INFO - Start running, host: madhav3101@gnode53, work_dir: /ssd_scratch/cvit/madhav/train_dataset/coco/logs
2020-04-06 12:45:38,006 - mmdet - INFO - workflow: [('train', 1)], max: 200 epochs
loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 141, in <module>
    main()
  File "tools/train.py", line 137, in main
    meta=meta)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 111, in train_detector
    meta=meta)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 242, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 359, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 263, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/apis/train.py", line 75, in batch_processor
    losses = model(**data)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/detectors/base.py", line 147, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 286, in forward_train
    mask_pred = mask_head(mask_feats)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/madhav3101/pytorch-codes/mmdetection/mmdet/models/mask_heads/fcn_mask_head.py", line 116, in forward
    x = self.upsample(x)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madhav3101/torch-env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 778, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM

sorry, did you solve the problem of the values of bbox for negative images?

mdv3101 commented 4 years ago

Hi @JosonChan1998 , @wingskh I trained a model by providing No Annotations for negative images. There is no 'bbox' in the coco format for negative images.

aymanaboghonim commented 1 year ago

I am facing the same issue, could anyone help me ?? when I set filter-empty-gt = True, training is starting normally but it could not when it is False . here is the error message image @mdv3101 @xvjiarui @wingskh