tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.45k stars 3.49k forks source link

Language Modeling Does Not Sample (even with sampling_method=random) #884

Open hughbzhang opened 6 years ago

hughbzhang commented 6 years ago

Description

We tried running language modeling with languagemodel_ptb10k and the transformer_small as recommended in the README. No errors / tensorboard training curves looked fine, but the decoder output is something like: "the the the the the the the" (and identical every time).

We looked through the code and found --hparams='sampling_method=random', but it still seems to be argmaxing instead of sampling (or maybe something else is wrong?). We have also tried with languagemodel_ptb_characters and with transformer_base and attention_lm with similar results (no sampling, same degenerate output every time).

Is there something flag that we are missing? Code below.

Thanks for the help in advance!

...

Environment information

OS:  Ubuntu 14.04

$ pip freeze | grep tensor
tensor2tensor==1.6.5
tensorboard==1.8.0
tensorflow==1.8.0

$ python -V
# Python 3.6.5 :: Anaconda, Inc.

For bugs: reproduction and error logs

Steps to reproduce:

...


PROBLEM=languagemodel_ptb10k
MODEL=transformer
HPARAMS=transformer_small

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

BEAM_SIZE=4
ALPHA=0.6

t2t-decoder \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --hparams='sampling_method=random' \
  --output_dir=$TRAIN_DIR \
  --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
  --decode_from_file=input.txt \ 
  --decode_to_file=output.txt

input.txt is a blank file with a dozen empty lines

s-xie commented 6 years ago

if you trace the hparams through the various layers of modification you'll see transformer_small-> transformer_base-> transformer_base_v2-> transformer_base_v1->common_hparams.basic_params1. In basic_params1, sampling_method is set to argmax: https://github.com/tensorflow/tensor2tensor/blob/6969fab42200a7da11bc40c9537b76b0a204b46a/tensor2tensor/layers/common_hparams.py#L90 and is never changed as the hparam set is modified into transformer_small. The same is true for transformer_base and the attention_lm.py file's preset hparams.

hughbzhang commented 6 years ago

Stanley, thanks for your response!

We saw that hyperparameter and tried to change it on the t2t-decoder (also tried on the t2t-train but that didn't work and we thought maybe its not necessary since you don't sample at train time anyways).

I also did the nuclear option of installing tensor2tensor from source and manually changing sampling_method="random", # "argmax" or "random" in case the hyperparam passing in wasn't working, but the results are all the same.

s-xie commented 6 years ago

have you tried logging/printing some things around here: https://github.com/tensorflow/tensor2tensor/blob/a4fa55a3f128753d006d26ba8691eb97d14fbcfc/tensor2tensor/utils/t2t_model.py#L1087 to see what the distribution you're sampling out of looks like? Does the code get to this function?

Chanrom commented 6 years ago

I have found two mirror issues when I use a trained language model to decode a sentence.

afrozenator commented 5 years ago

Quite strange -- could anyone figure out why "the" ends up with id = 0? We can look into it but would appreciate any help to fix it !

Thanks to everyone for the debugging.

@lukaszkaiser @rsepassi

diego-s commented 5 years ago

I noticed that if I use a beam_size of 1 then it goes into the "greedy" decoding, however it will look at the sampling_temp hyperparameter and if I specify a value of 1.0, it seems to correctly sample random tokens (which is great). Am I correct that one needs to specify a beam_size of 1 and a non-zero sampling_temp to generate random text? If so, perhaps there should be a warning if the sampling_method is "random" but the beam_size is not 1 or if the sampling_temp is 0?