Test/Demo generate blank results using Faster R-CNN trained on ECP, CityPersons. From the other side, the Faster R-CNN hrnet model does not converge.

AndyVerne commented 2 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help. checked
The bug has not been fixed in the latest version. checked

Describe the bug A clear and concise description of what the bug is.

When I tried to train the Faster R-CNN model via python tools/train.py configs/elephant/cityperson/faster_rcnn_hrnet.py, the model trained generated blank results like below: The same results happened after I chose the ECP as the training method via python tools/train.py configs/elephant/eurocity/faster_rcnn_hrnet.py.

Meanwhile, when I use the cascade mask R-CNN as the training method via python tools/train.py configs/elephant/cityperson/cascade_hrnet.py. Everything works.

I really have no clue why this happens. Any help is appreciated.

Reproduction

What command or script did you run?
```
training command:
```
python tools/train.py configs/elephant/cityperson/faster_rcnn_hrnet.py
python tools/train.py configs/elephant/cityperson/cascade_hrnet.py

demo command:

python tools/demo.py configs/elephant/cityperson/faster_rcnn_hrnet.py ./work_dirs/cityperson_faster_rcnn_hrnetv2p_w32/epoch_3.pth.stu demo/ result_demo_faster_r-cnn/
tools/demo.py configs/elephant/cityperson/cascade_hrnet.py ./work_dirs/cityperson_cascade_rcnn_hrnetv2p_w32/epoch_3.pth.stu demo/ result_demo/

A placeholder for the command.

2. Did you make any modifications on the code or config? Did you understand what you have modified?
**No**
3. What dataset did you use?
Test on ECP and CityPersons, both of two faster r-cnn methods doesn't work
**Environment**
 - OS: [e.g., Ubuntu 16.04.6] 
   Ubuntu 16.04.6
 - GCC [e.g., 5.4.0]
   5.4.0
 - PyTorch version [e.g., 1.1.0]
- How you installed PyTorch [e.g., pip, conda, source]
   pip
- GPU model [e.g., 1080Ti, V100] 
   V100
- CUDA and CUDNN version

**Error traceback**
If applicable, paste the error trackback here.

Pedestron/tools/../mmdet/apis/inference.py:39: UserWarning: Class names are not saved in the checkpoint's meta data, use COCO classes by default. warnings.warn('Class names are not saved in the checkpoint\'s '


***From the other side the model does not converge when training***

**Bug fix**
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

AndyVerne commented 2 years ago

More specific, the loss_rnp_cls doesn't converge.

hasanirtiza commented 2 years ago

Did you try changing the hyperparams ? To get it straight, you can train Cascade RCNN, but not Faster RCNN ?

AndyVerne commented 2 years ago

Did you try changing the hyperparams ? To get it straight, you can train Cascade RCNN, but not Faster RCNN ?

Thanks for the reply. I didn't change the hyperparams, the Cascade RCNN is fine. The Faster RCNN with HRNet doesn't work. Meanwhile the Faster RCNN with ResNet101 works out. I have no clue how to deal with it.

hasanirtiza commented 2 years ago

Then it is hyperparams most probably. Play around the learning rate, learning rate in this repo is set with 8 Gpus. If your number of gpus are less, use the linear scaling rule to adjust learning rate.

AndyVerne commented 2 years ago

Then it is hyperparams most probably. Play around the learning rate, learning rate in this repo is set with 8 Gpus. If your number of gpus are less, use the linear scaling rule to adjust learning rate.

Thank you so much. Really appreciate for replies! I will give it a try and update the feedback soon. :)

hasanirtiza / Pedestron

Test/Demo generate blank results using Faster R-CNN trained on ECP, CityPersons. From the other side, the Faster R-CNN hrnet model does not converge. #138