thuml / Xlearn

Transfer Learning Library
465 stars 155 forks source link

Backward mmd loss = NaN #27

Open elenoka opened 5 years ago

elenoka commented 5 years ago

Hello, I used caffe implementation. Sometimes MMD backward diff = NaN, and soon the whole network crushed. In my inplementation, the data is sliced into to branches in fc layers, source data and target data, and both of them are input of mk-mmd loss layer. It works well in the beginning, but after some epoches, the MK-MMD loss backward diff turn into NaN and the training process has to be stopped. Can you plz tell me why would this happen? Thank you so much!

zhaobeile commented 4 years ago

use

Hello, I used caffe implementation. Sometimes MMD backward diff = NaN, and soon the whole network crushed. In my inplementation, the data is sliced into to branches in fc layers, source data and target data, and both of them are input of mk-mmd loss layer. It works well in the beginning, but after some epoches, the MK-MMD loss backward diff turn into NaN and the training process has to be stopped. Can you plz tell me why would this happen? Thank you so much!

you can use the warmup strategy to solve the problem