Open liming-ai opened 2 years ago
Hello @mitming Could you provide the version of mmdetection and mmselfsup you used? I would like to reproduce this bug and explore the problem.
Hello @mitming Could you provide the version of mmdetection and mmselfsup you used? I would like to reproduce this bug and explore the problem.
@jbwang1997 Thanks a lot! Please try DETR and Deformable DETR first, they are failed to converge with nan
or inf
grad_norm.
Package Version Source
-------------- --------- ----------------------------------------------
mmcls 0.23.2 https://github.com/open-mmlab/mmclassification
mmcv-full 1.6.1 https://github.com/open-mmlab/mmcv
mmdet 2.25.1 https://github.com/open-mmlab/mmdetection
mmsegmentation 0.27.0 http://github.com/open-mmlab/mmsegmentation
mmselfsup 0.9.2 https://github.com/open-mmlab/mmselfsup
I think the problem is because mmselfsup force register their DefaultOptimizerConstructor
into OPTIMIZER_BUILDERS
, which covers the original DefaultOptimizerConstructor
in mmcv.
Both DETR and Deformable DETR need to set custom_keys
, which only make effect in mmcv DefaultOptimizerConstructor
.
This bug needs to be fixed in mmselfsup, It's better also to report it there.
BTW, this bug will not occurs if you use the newest mmdet 3.0 and mmselfsup 1.0
@jbwang1997 Thanks for your answer, in my experiment, this bug also affects other methods like mask-rcnn/fcos/retinanet, could you please check again?
This bug needs to be fixed in mmselfsup, It's better also to report it there.
BTW, this bug will not occurs if you use the newest mmdet 3.0 and mmselfsup 1.0
@jbwang1997, I have no idea why this bug affect other methods like Mask-RCNN and RetinaNet, since they do not need use custom_keys
, could you please check that?
Hi @mitming. I'm trying to figure out the problem. It may take some time. If I find out the problem, I will report the results in this issue as soon as possible.
Hi @mitming. I'm trying to figure out the problem. It may take some time. If I find out the problem, I will report the results in this issue as soon as possible.
Thanks a lot! Looking forward to your reply!
Hi @ZwwWayne @BIGWangYuDong @RangiLyu @jbwang1997 @hhaAndroid @chhluo , happy mid-Autumn Festival (🐶)~
This issue also open in mmselfsup
Thanks for reporting the unexpected results and we appreciate it a lot.
Checklist
Describe the Issue I tried to combine mmselfsup and mmdet to do something, however, when I added:
I found that DETR's training fails to converge, I also added this line into other methods like fcos/retinanet/mask-rcnn. In the case of fixed random seeds, their loss values and training results have huge changes. I'm clueless about this and what's causing the huge discrepancy, hoping to get some help from the community.
Reproduction
Add this line into any original mmdet config, you can find the loss and result changed a lot, even with fixed random seed.
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!