tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.47k stars 3.49k forks source link

Strange decoding results #1209

Open mikeymezher opened 5 years ago

mikeymezher commented 5 years ago

Description

Terms repeating themselves during decoding. Training and evaluation occur as normal. This doesn't appear to be an issue with decoding itself, as model trained in the past (pre T2T 1.10.0) perform fine.

This is occurring on a Text2Self type problem (defined in text_problems.py). I'm not quite sure where this problem stems. I've gone as far back as analyzing the logits produced in smoothing_cross_entropy (common_layers) and they appear fine

(ex: LABELS AT SOFTMAX CROSS ENTROPY: [[[[33481]][[1089]][[33480]][[33425]][[33423]][[3317]][[33424]][[33418]][[33416]][[3311]]]...] TOP 5 IDS INDICIES[[[[[10943 30030 10905 17983 31291]]][[[1087 2167 1085 1093 1079]]][[[33426 33484 33480 33481 33428]]][[[33481 33425 33412 33469 33482]]][[[33423 33414 33412 33426 33480]]][[[4435 4431 3319 4433 3321]]][[[33412 33424 33416 1 33417]]][[[33423 33418 3321 3319 3317]]][[[33416 33424 33421 33417 33423]]][[[3319 3317 33394 4393 4391]]]]...]TOP 5 IDS VALUES[[[[[6.24065685 5.62128639 5.55350065 5.52335882 5.27032423]]][[[9.33180618 9.04827118 8.96213 8.10716248 8.00762272]]]]...])

The first index is consistently wrong (the model would have no previous insight at this point) and the predictions continue to become more accurate - which seems correct. Evaluation also yields fine results.

During decode however, everything breaks - the imported weights lead to predicting somewhat reasonable terms for the second highest logit, but the first logit is consistently the most recent input term itself. I expected the inputs weren't being shifted to the right during training, but this doesn't appear to be the case.

...

Environment information

T2T 1.10.0 TF 1.11.0

mikeymezher commented 5 years ago

Also important to note proximal ids tend to be similar terms (ie 35000 ~= 35001). Which is why I know the above "top 5 ids indicies" (which equate to top 5 terms during decoding) are reasonable as they relate to their target during training.

mehmedes commented 5 years ago

I also experience decoding issues with models trained since T2T 1.10.0. The decoder reiterates the first word over and over again until hitting max_length. "How are you" for example is translated into "Wie Wie Wie Wie Wie Wie Wie Wie Wie Wie Wie Wie Wie Wie ..." Loss and evaluation stats during training are fine.

mikeymezher commented 5 years ago

@mehmedes Out of curiosity is your problem type Text2Text (Full encoder + decoder stacks)?

mehmedes commented 5 years ago

I suppose so. I run a translate_wmt problem.