tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Mulitple translations (top-K) for a given input sentence in transformer Text2Text #977

Open repoloper opened 6 years ago

repoloper commented 6 years ago

Description

In the transformer-translation task, is there a way to get multiple alternate translations?

I've been trying to get multiple translations for an input sentence. I have set the num_decodes variable to 5 to no effect. I am using decode-from-dataset. The command I am using is along the lines of:

t2t-decoder --t2t_usr_dir ./usrdir --data_dir ~/datadir --problem translate_ende_wmt32k
 --model transformer --hparams_set transformer_base --output_dir ~/outputdir --decode_to_file translations.en 
--decode_hparams beam_size=8,alpha=0.9,num_decodes=5

I see that it is being used in the code here, but I don't understand if decode_once means that there's only one output coming out of the decoder even if there is a loop.

https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/decoding.py#L173

Can provide more information as needed. ...

Environment information

OS: Ubuntu

$ pip freeze | grep tensor
tensor2tensor==1.7.0
tensorboard==1.8.0
tensorflow==1.8.0

$ python -V
Python 3.5.2
zealseeker commented 5 years ago

Refer to #1334

t2t-decoder \
   --data_dir=$DATA_DIR \
   --problem=$PROBLEM \
   --model=$MODEL \
   --hparams_set=$HPARAMS \
   --output_dir=$TRAIN_DIR \
   --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA,return_beams=True" \
   --t2t_usr_dir=$USR_DIR

return_beams=True is important to get the alternatives. beam_size is the number of examples you want to get.