tmbdev / clstm

A small C++ implementation of LSTM networks, focused on OCR.
Apache License 2.0
821 stars 224 forks source link

Many NaN errors with the new Tensor-based version #63

Open ASDen opened 8 years ago

ASDen commented 8 years ago

For many models (especially deep ones with many parameters e.g. bidi2), I keep getting the following error

clstm.cc:664: void ocropus::GenericNPLSTM<F, G, H>::backward() [with int F = 1; int G = 2; int H = 2]: Assertion `!anynan(out)' failed.

where the old version (Mat-based) works just fine

ASDen commented 8 years ago

can you please confirm the problem ? or it is just me misusing CLSTM...

MichalBusta commented 8 years ago

Hi, I believe it's just wrong assert. The assert is after input assignment, so ".d" derivatives parts are still un-initialized. (for larger networks you just increase probability that there will be random nan).

the proper fix can be:

bool anynan(Batch &a) { if(anynan(a.v())) return true; if(anynan(a.d())) return true; //this is failing return false; }

bool anynan_v(Batch &a) { if(anynan(a.v())) return true; return false; }

and replace anynan with anynan_v during the forward step.