yajiemiao / eesen

The official repository of the Eesen project
Apache License 2.0
202 stars 72 forks source link

Did you have experience with Obj = nan, TokenAcc = nan%? #2

Closed lifelongeek closed 8 years ago

lifelongeek commented 8 years ago

Hi I am slightly modifying your character based RNN+CTC experiment on swbd. I am trying to use minimal character unit (alphabet(26) + {space ' - } + noise + laugh + vocal-noise) instead of including all the characters such as digits, &, _ . Thus RNN have 34 output units. For this experiment, I had to modify lexicon2.txt & units.txt, and this makes transcription have longer sequence than before. For example) 260 : t w o - s i x t y, 401k : f o u r - o - o n e - k

But, this experiment produce nan for Obj & TokenAcc consistently even if I tried with smaller learning rate & various RNN architecture. I suspect this is because 'train-ctc-parallel' does not rescale alpha, beta during forward-backward algorithm. It seems that non-parallel version use rescaling kernel. (i.e. _compute_ctc_alpha_one_sequence_rescale). But parallel version does use code without rescaling.

Did you have similar experience about nan error? Did I miss rescaling part from your code? Hope I did not make mistake and bother you much.

Here is the a few lines of log example VLOG1 After 20 sequences (0.000913889Hr): Obj(log[Pzx]) = -50.5868 TokenAcc = -nan% VLOG1 After 40 sequences (0.00273056Hr): Obj(log[Pzx]) = nan TokenAcc = -260% VLOG1 After 60 sequences (0.00498056Hr): Obj(log[Pzx]) = -59.5562 TokenAcc = -140.909% VLOG1 After 80 sequences (0.00747778Hr): Obj(log[Pzx]) = -75.5068 TokenAcc = -34.9206% VLOG1 After 100 sequences (0.0101417Hr): Obj(log[Pzx]) = -67.462 TokenAcc = -66.6667% VLOG1 After 120 sequences (0.0129083Hr): Obj(log[Pzx]) = -31.2848 TokenAcc = 11.6279% ... VLOG1 After 740 sequences (0.1354Hr): Obj(log[Pzx]) = 10.6927 TokenAcc = 0% VLOG1 After 760 sequences (0.140061Hr): Obj(log[Pzx]) = 8.33837 TokenAcc = 0% VLOG1 After 780 sequences (0.144731Hr): Obj(log[Pzx]) = 4.95938 TokenAcc = 0% VLOG1 After 800 sequences (0.149453Hr): Obj(log[Pzx]) = 2.56755 TokenAcc = 0% VLOG1 After 820 sequences (0.154206Hr): Obj(log[Pzx]) = 9.55445 TokenAcc = 0%

yajiemiao commented 8 years ago

I don't think it's a problem with rescaling. The rescaling and non-rescaling versions do the same thing. Can you check if you have empty labels in your training data? That is, the label sequence for an utterance is empty.

lifelongeek commented 8 years ago

Found the reason. When I check labels.tr, there is label which exceeds maximum number of lexicon unit(i.e. 33). I should have checked this file first when I changed the lexicon2.txt. Now training does not make any nan in Obj & Token Acc.

Close the issue.