Open nimasnjb opened 6 years ago
Confirming same issue with repeated words in transformer_prepend
hparams.
@nimas62 which command you've used for training?
Mine are bellow.
Prepend:
t2t-trainer --data_dir=~/t2t/data --output_dir=~/t2t/train/transformer_v2 --problems=summarize_cnn_dailymail32k --model=transformer --hparams_set= transformer_prepend --hparams batch_size=4096 --train_steps=100000 --eval_steps=100 --worker_gpu=2 --keep_checkpoint_max 10
Original:
t2t-trainer --data_dir=~/t2t/data --output_dir=~/t2t/train/transformer_v1 --problems=summarize_cnn_dailymail32k --model=transformer --hparams_set=transformer_base --hparams batch_size=4096 --train_steps=100000 --eval_steps=100 --worker_gpu=2 --keep_checkpoint_max 10
It's all about hyperparameters. So there's no need to include your command. transformer_prepend and transformer_base sets both have the same origin which is transformer_base_v2. You can check all the hparams in tensor2tensor/models/transformer.py The difference is max_length=0 and prepend_inputs_masked_attention in transformer_prepend. I checked the max_length of zero, it's ok. the problem lies in the masked attention.
@nimas62 I tried to decode from my current model trained with batch_size = 1536
learning_rate=0.2
and for 75002 steps.
(some metrics are bugged, for example in this example rouge 2 is just until the step 42k meanwhile the top5 acc is to the step 75k
I tried to decode from my model, the steps that I followed was: copy one article in a .txt inside the t2t_data named "newsum.txt" and an empty file inside the same folder named "summarization.txt"
the command that I ran:
t2t-decoder --data_dir=~/t2t_data --problems=summarize_cnn_dailymail32k --model=transformer --hparams_set=transformer_prepend --output_dir=~/t2t_train/sum2 --decode_hparams="beam_size=4,alpha=0.6" --decode_from_file=/home/proto/t2t_data/newsum.txt --decode_to_file=/home/proto/t2t_data/summarization.txt
the input:
WASHINGTON — Congressional Republicans have a new fear when it comes to their health care lawsuit against the Obama administration: They might win. The incoming Trump administration could choose to no longer defend the executive branch against the suit, which challenges the administration’s authority to spend billions of dollars on health insurance subsidies for and Americans, handing House Republicans a big {...}
the output was:
INFO:tensorflow:Inference results OUTPUT: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . £ . £ £ £ £ £ £ £ £ £ . £ . £ . £ . £ . £ . £ £ £ £ £ £ £ £ . £ . £ £ £ £ £ £ £ . £ . £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ [2018-03-31 15:11:38,501] Inference results OUTPUT: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . £ . £ £ £ £ £ £ £ £ £ . £ . £ . £ . £ . £ . £ £ £ £ £ £ £ £ . £ . £ £ £ £ £ £ £ . £ . £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ INFO:tensorflow:Inference results INPUT: [2018-03-31 15:11:38,501] Inference results INPUT: INFO:tensorflow:Inference results OUTPUT: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [2018-03-31 15:11:38,506] Inference results OUTPUT: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I suppose that I did something wrong. I will check what happened, but meanwhile... this is my result.
Results of the model after 17k steps (all the hyperparameters are the basics from transformer_prepend, I just changed the batch_size to 1532 because I just have 8GB of GPU. I know that there are few steps but the results does not seems promising.
It would be good if anyone could share a good pre-trained model.
What I got isn't better than this. This model doesn't work well. In the quick start guide, they have said "We suggest to use --model=transformer and --hparams_set=transformer_prepend for this task. This yields good ROUGE scores." I wish someone add more details so that we could replicate the experiment. Tuning model needs time and patience. But When you can't rely on the model, you don't know whether you have to tune it better or it's a bug(s), and it's really confusing.
Which python-tensorflow matches are you using? I had python 2.7 (and used tensorflow for python 2.7 of course). I should try python 3.5.
I am using tf 1.7rc and python 3.5. By the way, I have found other repo that you could check for summarization, https://github.com/abisee/pointer-generator. I have found that the state-of-the-art is not very promising about abstractive summarization that was that I was looking for, so you can check this repo and see if it is helpful to you. @nimas62 . Happy coding.
This was indeed a bug that should now be fixed.
i still have the same problem. I am using tensorflow 1.10 , tensor2tensor 1.9 and python 3.6.
same, a lot of repetition of sentences that make no sense.
I also trained transformer for summarization. I'm using tensorflow 1.09, tensor2tensor 1.9, python 3.52. I used hparams=transformer_prepend and model=transformer for training. Due to memory constraint, I used batch_size=256 and trained the model for around 160k training steps. I check how well the model is doing by create a new .txt file that contains an article from CNN and use t2t-decoder to generate summary. My command is as follow:
t2t-decoder --data_dir=$DATA_DIR --problem=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" --decode_from_file=$DECODE_FILE --decode_to_file=summarization.txt
where HPARAMS=transformer_prepend, model=transformer, beam_size=4, alpha=0.6
Here is the result from log:
INFO:tensorflow:Inference results INPUT: 6 lesser-known facts about Iran's Foreign Minister Javad Zarif. If you've been following the news lately, there are certain things you doubtless know about Mohammad Javad Zarif.He is, of course, the Iranian foreign minister. He has been U.S. Secretary of State John Kerry's opposite number in securing a breakthrough in nuclear discussions that could lead to an end to sanctions against Iran -- if the details can be worked out in the coming weeks. And he received a hero's welcome as he arrived in Iran on a sunny Friday morning."Long live Zarif," crowds chanted as his car rolled slowly down the packed street.You may well have read that he is "polished" and, unusually for one burdened with such weighty issues, "jovial." An Internet search for "Mohammad Javad Zarif" and "jovial" yields thousands of results. He certainly has gone a long way to bring Iran in from the cold and allow it to rejoin the international community. But there are some facts about Zarif that are less well-known. Here are six: INFO:tensorflow:Inference results OUTPUT: By. Simon Jones for the Daily Mail. Everton are considering a move for Everton are willing to sign Stoke City's interest in Stoke City's Brazilian international but Stoke are willing to sign the 28-year-old. Everton are willing to offer £5m for the 26-year-old who is also keen on West Bromwich Albion. Wanted man: Everton are keen on signing Stoke City'o. Wanted man: Everton are interested in Stoke City's young defender Ryan Sunday. On the move: Everton are set to sign Stoke City's move. Meanwhile, Stoke City are keen on Stoke City's Kevin Sunday who was on loan at Stoke City. INFO:tensorflow:Inference results INPUT: He was educated in the land of the 'Great Satan'. His foreign ministry notes, perhaps defensively, that "due to the political and security conditions of the time, he decided to continue his education in the United States." That is another way of saying that he was outside the country during the demonstrations against the Shah of Iran, which began in 1977, and during the Iranian Revolution, which drove the shah from power in 1979. Zarif left the country in 1977, received his undergraduate degree from San Francisco State University in 1981, his master's in international relations from the University of Denver in 1984 and his doctorate from the University of Denver in 1988. Both of his children were born in the United States. INFO:tensorflow:Inference results OUTPUT: By. Simon Jones for the Daily Mail. Everton are considering a move for Everton are willing to sign Stoke City's interest in Stoke City's Brazilian international but Stoke are keen on the youngster following the arrival of Stoke City left-back. The 22-year-old is a target for West Bromwich Albion but Stoke are also keen on the deal for Stoke City left-back. Target: Stoke are interested in Stoke City's Championship clubs want to take him on loan. On the move: Stoke City are set to sign Stoke City's Los Angeles. Stoke City are willing to offer a deal for around £3million. INFO:tensorflow:Inference results INPUT: He tweets in English. In September 2013, Zarif tweeted "Happy Rosh Hashanah," referring to the Jewish New Year. That prompted Christine Pelosi, the daughter of House Minority Leader Nancy Pelosi, to respond with a tweet of her own: "Thanks. The New Year would be even sweeter if you would end Iran's Holocaust denial, sir." And, perhaps to her surprise, Pelosi got a response. "Iran never denied it," Zarif tweeted back. "The man who was perceived to be denying it is now gone. Happy New Year." The reference was likely to former Iranian President Mahmoud Ahmadinejad, who had left office the previous month. Zarif was nominated to be foreign minister by Ahmadinejad's successor, Hassan Rouhami. INFO:tensorflow:Inference results OUTPUT: By. Simon Jones for the Daily Mail. Everton are considering a move for Everton are willing to offer £5million for the 26-year-old who is keen on a new contract. Everton are willing to offer £5million for the striker. Everton are also interested in Stoke City'o. VIDEO Scroll down to watch Everton are interested in Swansea left-back. . INFO:tensorflow:Inference results INPUT: His precise age is uncertain. The website of the Iranian Foreign Ministry, which Zarif runs, cannot even agree with itself on when he was born. The first sentence of his official biography, perhaps in a nod to the powers that be in Tehran, says Zarif was "born to a religious traditional family in Tehran in 1959." Later on the same page, however, his date of birth is listed as January 8, 1960. And the Iranian Diplomacy website says he was born in in 1961. So he is 54, 55 or maybe even 56. Whichever, he is still considerably younger than his opposite number, Kerry, who is 71. INFO:tensorflow:Inference results OUTPUT: By. Simon Jones for the Daily Mail. Everton are considering a move for Everton are willing to sign Stoke City's interest in Stoke City's Brazilian international. Everton are willing to sign Stoke City are willing to sign Stoke City and Sunderland are keen on the 28-year-old who is yet to offer £5million for the 26-year-old. Wanted man: Everton are interested in Stoke City'o. On the move? Everton are keen on signing Everton are interested in signing Stoke City's young international man but Stoke City are ready to sign him. Meanwhile, Stoke City are showing interest in Stoke City left-back. .
This result just makes no sense at all. I would like to ask if this is due to me not train the model for enough of training steps (i.e., can a better result be achieved with more train steps) or train the model in the wrong way. I have read several issues about summarization with tensor2tensor's transformer. Is my result due to the mentioned bugs? Has it been fixed?
Here is the details of the training:
@hoang-ho, @gacelardi: you mentioned that you decreased the batch_size
in order to fit the model onto your gpu. Please note that in transformer prepend max_length
is set to zero, which means that max_length = batch_size
. By decreasing batch_size
the model may not be trained on the necessary data length. The excerpts from tensorboard and the loss results stated above look like the model is diverging, which may be due to the decreased batch_size
. You may decrease the learning_rate
to counteract divergence but that may not be sufficient as the max_length
may still be too low due to the decreased batch_size
.
Have you tried setting max_length
to a certain number while decreasing batch_size
?
You may also try transformer_moe
, which will allow you to train models with greater batch sizes without the need of more computation.
Making sure batch size is 4096 when max length= 0 really helps!
PROBLEM=summarize_cnn_dailymail32k
MODEL=transformer
HPARAMS=transformer_tpu
t2t-trainer\
--data_dir=$DATA_DIR \
--problem=$PROBLEM \
--worker_gpu=4 \
--hparams="batch_size=4096, max_length=0"
INFO:tensorflow:Saving dict for global step 2052000: global_step = 2052000, loss = 2.6060317, metrics-summarize_cnn_dailymail32k/targets/accuracy = 0.5484919, metrics-summarize_cnn_dailymail32k/targets/accuracy_per_sequence = 0.0, metrics-summarize_cnn_dailymail32k/targets/accuracy_top5 = 0.7531322, metrics-summarize_cnn_dailymail32k/targets/approx_bleu_score = 0.23455855, metrics-summarize_cnn_dailymail32k/targets/neg_log_perplexity = -2.6244667, metrics-summarize_cnn_dailymail32k/targets/rouge_2_fscore = 0.31604615, metrics-summarize_cnn_dailymail32k/targets/rouge_L_fscore = 0.4611887```
Anything new on the subject ? I have the same problem of an output with a sequence of a repeated word ..
I think there's no need to set the "prepend_inputs_masked_attention" to True in summarization task.
It just literally prepends the encoder inputs to the decoder inputs with a single token between.
you can use it if you want to use the transformer in Language Model.
You can check the hyperparameter and the comment in
/tensor2tensor/layers/common_hparams.py
`
prepend_inputs_masked_attention
replace "targets" in preprocessing with
tf.concat([inputs, [0], targets], axis=1)
i.e. we prepend the inputs to the targets with a single
padding token in between. Use masked self-attention on the
entire resulting sequence. During training, we compute losses on
the combined sequence. During eval, we compute the metrics
on only the targets portion.
`
Description
Problem: CNN_dailymail Model: Transformer hparams: transformer_prepend, transformer_base_v2
When I train the model with transformer_prepend hparams, the outputs of the decoder are a sequence of a repeated word. "foo, foo, foo, ...". I use transformer_base_v2 and set the max_lenght to 0. So the only difference between these two sets of parameters is the prepend_inputs_masked_attention. Without it, the outputs are as they supposed to be, but the performance of the model decreases by approximately 40%. It must be a bug in the decoder.
TensorFlow and tensor2tensor versions
TensorFlow 1.6, tensor2tensor 1.5.5