Open franklu323 opened 6 years ago
I met the same problem when I change the cfg file and then convert weights by using 'python convert.py -w yolov3.cfg yolov3.weights model_data/yolo_weights.h5, and I found it runs well with the yolo_weights.h5 which never change anything on cfg file,but it output score is very low.
Same here, but previously, the code works fine, even after I change the cfg file. But right now, it just keep telling me runtime error and val_loss remains NaN at the whole training stage. And if I just use original cfg to convert weights, it works. If you slove this problem, please let me know too. Thank you very much. @18814181500 @qqwweee
@franklu323 After training in many different situations,It looks well with load yolo_weights.h5 with freezing layers now. I tried load/not load darknet53_weights.h5 with/without freezing the previous layers, and got four different trained_weights finnally, but all of them can't drew a good box and got very low score always(<0.5), and I can't tell why now. Then I tried load yolo_weights with freezing layers to train and got a nice score and box, and I tried load yolo_weights without freezing layers to train and It's output is little worst than freezing the layers,particularlly multiple object in one picture. By the way, my dataset is just 183 pictures, and training epoch is 500. and I only train one class.
不修改cfg文件,然后转化模型接着训练的话一点问题都没有!但是修改完cfg,训练损失正常 然而验证损失一直是nan!请问这是怎么回事呢?我用的最新的代码。也看了之前关闭的类似的问题,并没有解决。谢谢@qqwweee
I am also getting same error. Any one can help :(
@franklu323 . Did you get it resolved ?
@JingangLang I am also getting same error.Did you get it resolved?
不修改cfg文件,然后转化模型接着训练的话一点问题都没有!但是修改完cfg,训练损失正常 然而验证损失一直是nan!请问这是怎么回事呢?我用的最新的代码。也看了之前关闭的类似的问题,并没有解决。谢谢 @JingangLang 这个问题解决没,望回复,谢谢
@JingangLang _我也遇见这个问题了,修改了config文件之后,loss是正常。val-loss一直是nan。不知道问题在哪。不知道您有没有解决这个问题,求指教。谢谢
I didn't solve this problem, it seems u don't need to change the cfg file. and i don't know y.
I didn't solve this problem, it seems u don't need to change the cfg file. and i don't know y.
are you sure?,in fact,the first time,i did like you say,but the model wasnt effective.
@franklu323
Sorry to disturb you,
I am suffering the same problem with you, I used the original yolov3.cfg file, it can work normal, the val_loss is pretty good, however, the model detected score is very low (<0.3) after the stage1 training(50 epoch).
On the contrary, I changed the yolov3.cfg and revised the 'classes, filters etc', the val_loss becomes Nan, I have no idea about it, Have you solved it?
I am looking forward to your reply, many thanks.
Kind regards
Wei
Same issue. Val_loss is coming nan.
I met this problem too. And I can't solve it. Val_loss is nan. May you give us some explaination? Best wishes to you. @qqwweee
不修改cfg文件,然后转化模型接着训练的话一点问题都没有!但是修改完cfg,训练损失正常 然而验证损失一直是nan!请问这是怎么回事呢?我用的最新的代码。也看了之前关闭的类似的问题,并没有解决。谢谢@qqwweee
不修改cfg文件,那h5文件load不会报错吗,在自己的数据的话
got the same problem with custom yolov3 conifg (2 classes), default acnhors with normal config i configure config according to this instruction: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects and get val_loss nan and very bad training.
on the other hand, with default cfg, everything looks fine, but i can't load this model, according to this issue: https://github.com/qqwweee/keras-yolo3/issues/48 @qqwweee @franklu323 @bodhwani @MinghuiJ
The same problem too. @qqwweee
I was experiencing the same problem while using tiny-yolo and the reason was that the weights were wrong. I downloaded the yolov3-tiny.weights from darknet along with the yolov3-tiny.cfg and convert it using the command :
python3 convert.py yolov3-tiny.cfg yolov3-tiny.weights tiny-yolo.h5
The problem was resolved.
I am also facing the same issue.none of the suggestions worked.any help would be appreciated. @qqwweee
when I change the .cfg file,I come across the same question : val_loss is nan ,but when i enforce it come to the second train stage: Unfreeze all the layers ,the val_loss go to normal,and the I use the weights go back to train stage 1,it is ok.
Hi, I got some issue during training my own model. I only have 1 class and I already change cfg file class=1 and filters=18. And I convert it to h5 file. But during the training process, validation loss remains NaN (for both old version/new version codes). It works fine before and I didn't change any of these codes. This is on the new version of the codes, it works with RuntimeWarning when training, and it won't stop before 50 epochs. Just stopped during fine-tune (out of memory). This is on the previous version of the codes, and it stopped before 10 epochs (early stop) with NaN on validation loss.
Can anyone help me with this problem? I convert weights by using 'python convert.py -w yolov3.cfg yolov3.weights model_data/yolo_weights.h5'. And I change all relevant class number and filters before training. Thanks