Open Single430 opened 5 years ago
@lukaszkaiser
Try not using prepending? I know the docs say to do that for summarization, but it's only given me weird results.
How many steps did you train? @epurdy
Did you set the "prepend_mode" to none ? unless you would concat the encoder input to the decoder input
PROBLEM=summarize_cnn_dailymail32k MODEL=transformer HPARAMS=transformer_prepend
t2t-trainer --data_dir=$DATA_DIR --problem=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR --worker_gpu=4 --hparams='batch_size=4096,prepend_mode=none,max_input_seq_length=512,max_target_seq_length=100' --train_steps=200000 --keep_checkpoint_max=10 --local_eval_frequency=5000 --eval_steps=30 --eval_run_autoregressive=False
Description
The result of my Chinese abstract extraction training using tensor2tensor is very poor. I have been looking for a long time, but I still can't find the problem, I hope I can get a reply.
Environment information
For bugs: reproduction and error logs