model is collapsing or giving nan loss while finetuning

titu1994 / keras-efficientnets

Keras Implementation of EfficientNets

MIT License

187 stars 63 forks source link

model is collapsing or giving nan loss while finetuning #7

Closed mkulariya1 closed 5 years ago

mkulariya1 commented 5 years ago

I am fine tuning the pretrained models for my work, B0 and B2 with image size 224,224 works fine but B3, B4 and B5 are not working, B3 and B5 are Collapsing in first epoch itself giving very low value for loss and accuracy also very low as loss 1.150e-07 and accuracy 0.1085 same with validation also, B4 is giving loss as nan after couple of epochs. Sequence:

Loading model without last layer
Adding global pool, dropout, dense layer and softmax
Training only the last layer for 5 epochs(working fine in all the cases)
Training whole network(not working in B3,B4,B5) Tried with very low lr as 1e-08 but did not work.

titu1994 commented 5 years ago

Is this a custom dataset ? Each of these models had it's own input size.

mkulariya1 commented 5 years ago

@titu1994 yes it is a custom dataset, so you mean B3, B4 and B5 will not work with input size 224?

titu1994 commented 5 years ago

No they won't. They were built for much larger input sizes.

mkulariya1 commented 5 years ago

@titu1994 Tried B4 with input size 380 and still got Nan loss.

titu1994 commented 5 years ago

Interesting. Extremely small loss for certain models but not others is very weird. Can you try training with random initialization ? As in weights=None for the B3 model?

mkulariya1 commented 5 years ago

Okay, I will try that.

mkulariya1 commented 5 years ago

lr_finder_b3 I tried with B3 input size 300, you can see loss is going down rapidly and accuracy is also going down after a certain point(this result is with batch size 16), with batch size 8 it is working fine(strange).

titu1994 commented 5 years ago

So it's an issue of batchsize more than anything? Barchnorm with batch size of 16 and below is known to be unstable. You could try one thing which is use a custom training loop and aggregate multiple batches before doing the update for an effectively larger batch size.

titu1994 commented 5 years ago

If you were able to train with larger batch sizes, would you mind closing the issue?

mkulariya1 commented 5 years ago

I was able to train, but not with larger size, batch size 8 worked perfectly for me(although it is very slow).