Open hughbzhang opened 6 years ago
if you trace the hparams through the various layers of modification you'll see transformer_small-> transformer_base-> transformer_base_v2-> transformer_base_v1->common_hparams.basic_params1. In basic_params1, sampling_method is set to argmax: https://github.com/tensorflow/tensor2tensor/blob/6969fab42200a7da11bc40c9537b76b0a204b46a/tensor2tensor/layers/common_hparams.py#L90 and is never changed as the hparam set is modified into transformer_small. The same is true for transformer_base and the attention_lm.py file's preset hparams.
Stanley, thanks for your response!
We saw that hyperparameter and tried to change it on the t2t-decoder (also tried on the t2t-train but that didn't work and we thought maybe its not necessary since you don't sample at train time anyways).
I also did the nuclear option of installing tensor2tensor from source and manually changing sampling_method="random", # "argmax" or "random"
in case the hyperparam passing in wasn't working, but the results are all the same.
have you tried logging/printing some things around here: https://github.com/tensorflow/tensor2tensor/blob/a4fa55a3f128753d006d26ba8691eb97d14fbcfc/tensor2tensor/utils/t2t_model.py#L1087 to see what the distribution you're sampling out of looks like? Does the code get to this function?
I have found two mirror issues when I use a trained language model to decode a sentence.
the demo problem languagemodel_ptb10k
generate vocabulary file that has word the
with id->0, thus <pad>
's is 1, <EOS>
's is 2, so this line will give wrong eos_id
to beam_search
decoding processing. It results wrong terminal state. https://github.com/tensorflow/tensor2tensor/blob/1de75bda4bd4c98ca50bcdbcf5e94b388bf9a044/tensor2tensor/models/transformer.py#L812
language model problem has only targets
, so if the model decodes those targets words, it will be striped, see this line:
https://github.com/tensorflow/tensor2tensor/blob/57444300243f068bad88eb5ed51a9793c4bde172/tensor2tensor/models/transformer.py#L442 . However, in the preprocessing, <EOS>
is automatically added to the targets
, the model will then always decodes <pad>
after<EOS>
. Thus nothing is outputed.
Quite strange -- could anyone figure out why "the" ends up with id = 0? We can look into it but would appreciate any help to fix it !
Thanks to everyone for the debugging.
@lukaszkaiser @rsepassi
I noticed that if I use a beam_size of 1 then it goes into the "greedy" decoding, however it will look at the sampling_temp hyperparameter and if I specify a value of 1.0, it seems to correctly sample random tokens (which is great). Am I correct that one needs to specify a beam_size of 1 and a non-zero sampling_temp to generate random text? If so, perhaps there should be a warning if the sampling_method is "random" but the beam_size is not 1 or if the sampling_temp is 0?
Description
We tried running language modeling with languagemodel_ptb10k and the transformer_small as recommended in the README. No errors / tensorboard training curves looked fine, but the decoder output is something like: "the the the the the the the" (and identical every time).
We looked through the code and found --hparams='sampling_method=random', but it still seems to be argmaxing instead of sampling (or maybe something else is wrong?). We have also tried with languagemodel_ptb_characters and with transformer_base and attention_lm with similar results (no sampling, same degenerate output every time).
Is there something flag that we are missing? Code below.
Thanks for the help in advance!
...
Environment information
For bugs: reproduction and error logs
Steps to reproduce:
...
input.txt is a blank file with a dozen empty lines