Questions on implementation of YOLOF

open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark

https://mmdetection.readthedocs.io

Apache License 2.0

29.41k stars 9.43k forks source link

Questions on implementation of YOLOF #7126

Open thisisi3 opened 2 years ago

thisisi3 commented 2 years ago

Dear Team, In README of YOLOF's config, you mentioned "sometimes there are large loss fluctuations and NAN". I am wondering if this also happens in the original implementation(the detectron2 version), and if you guys have found the reason that caused this issue?

hhaAndroid commented 2 years ago

@thisisi3 The experiment found that the probability of the d2 version appearing NAN is relatively low. We will investigate the reason in the near future. If you are interested, you can participate together.

thisisi3 commented 2 years ago

Sure, I am currently implementing YOLOF in PaddleDetection and will share my findings if I think they are valuable to this issue.

twmht commented 2 years ago

@hhaAndroid @thisisi3

Any update on this? I have nan problem when training with widerface, even reducing my learning rate. Not easy to debug because this always happened in the middle of training.

thisisi3 commented 2 years ago

According to my experience, YOLOF has 1 abnormal loss in every 6 trainings on VOC, so it should successfully train most of the time. I have not trained YOLOF on widerface so couldn't give you any advises, sorry.

twmht commented 2 years ago

@thisisi3

Why there is an abnormal loss? is this a bug?

thisisi3 commented 2 years ago

Hi @twmht To be honest I haven't found the reason why this happens, further investigation is needed.

twmht commented 2 years ago

@thisisi3

nan problem is solved after replacing yolof head with gfl head. However, the accuracy is not good. But In conclusion there might be some bugs in yolof head.

thisisi3 commented 2 years ago

@twmht Interesting, I always thought the problem might be from the assigner, but it seems not.

twmht commented 2 years ago

Maybe it's caused by assigner, since i use atss assigner in gfl head .

thisisi3 commented 2 years ago

@hhaAndroid , @twmht Check out the following post, @yjh0410 probably has found the reason that caused the abnormal loss. https://github.com/thisisi3/Paddle-YOLOF/issues/1#issuecomment-1115545926