Can not reproducing the result of audio_tagging result of mobilenetv1 in the PANNs paper, is there any tricks when training?

Hello, thanks for providing the source code and traning data. I have download the audioset dataset from Baidu network disk you provided, and train the mobilenetv1 model from scratch following the steps you mentioned in "Train PANNs from scratch". But the problem is, I can not reproducing your training result which you provided.(MobileNetV1_mAP=0.389.pth) When my training iteration reaches 234000, the LOSS is still 1.1358, and the Validate bal mAP is 0.005 and Validate Test mAP is 0.005. It seems that the two mAP never changed and the model can not convergent. would you please give me some guidance? Is there any tricks when traning the model?

Looking forward for your reply~ thank you

qiuqiangkong / audioset_tagging_cnn

Can not reproducing the result of audio_tagging result of mobilenetv1 in the PANNs paper, is there any tricks when training? #48