Backward mmd loss = NaN

use

Hello, I used caffe implementation. Sometimes MMD backward diff = NaN, and soon the whole network crushed. In my inplementation, the data is sliced into to branches in fc layers, source data and target data, and both of them are input of mk-mmd loss layer. It works well in the beginning, but after some epoches, the MK-MMD loss backward diff turn into NaN and the training process has to be stopped. Can you plz tell me why would this happen? Thank you so much!

you can use the warmup strategy to solve the problem

thuml / Xlearn

Backward mmd loss = NaN #27