qkaren / Counterfactual-StoryRW

code and data for EMNLP-19 paper "Counterfactual Story Reasoning and Generation" https://arxiv.org/abs/1909.04076
MIT License
101 stars 19 forks source link

Longer sequences is being generated than expected. #11

Open fabrahman opened 5 years ago

fabrahman commented 5 years ago

Hi,

I trained a model using your code on different data and I realized for all the models that I trained, the model is generating the output sequence that is a lot longer than expected and thus toward the end of output, the generation quality is getting worse and worse with repetition. For example in one of my model my training target data has 8 sentences, but the model keep generating more 13 or more.

I realize the sample length is defined here. But I dont know why in my case it is generating long sequence than expected. I assume the model should learn to generate eos_token in a way that mitigate training data. Note than I commented out this block while training.

I wonder if you know any strategy that I can devise to resolve this?

fabrahman commented 5 years ago

@qkaren I really appreciate your help.

qkaren commented 5 years ago

The model should be able to learn generating EOS. This line truncates all tokens after EOS.

You may want to make sure that at training time the model does see the EOS. For example, when calling sequence_sparse_softmax_cross_entropy, make sure sequence_length counts in the EOS token, otherwise the EOS token will be masked and the model won't not see EOS at training time. E.g., here, full_len counts in the EOS token

fabrahman commented 5 years ago

The model should be able to learn generating EOS. This line truncates all tokens after EOS.

You may want to make sure that at training time the model does see the EOS. For example, when calling sequence_sparse_softmax_cross_entropy, make sure sequence_length counts in the EOS token, otherwise the EOS token will be masked and the model won't not see EOS at training time. E.g., here, full_len counts in the EOS token

Hi Karen, thanks for your reply.

Sorry for my confusion, '<|endoftext|>' here acts more like a PAD token ? Since sequence_length are computed before padding with '<|endoftext|>' here, that means you had an extra '<|endoftext|>' at the end of each training instance before processing them, right?

fabrahman commented 5 years ago

edited my comment!

fabrahman commented 4 years ago

@qkaren I think you should add similar line of code : _all_samples_text = tx.utils.strip_eos(_all_samples_text, eos_token='<|endoftext|>') for the generated samples. right now it only truncate EOS from end of input text.

khushboo-mehra commented 4 years ago

Hi @qkaren I ran the code, straight from this repo, on the timetravel dataset and have the same issue. The model rewrites the original story and then starts producing generic, repetitive phrases. Then, I tried truncating the generated samples as @fabrahman suggested, but the output was the same. So I think that the current version of the code does not generate the EOS token. Can you please check if the repo is up to date with the code you used for the paper? Thanks!

qkaren commented 4 years ago

Hi @qkaren I ran the code, straight from this repo, on the timetravel dataset and have the same issue. The model rewrites the original story and then starts producing generic, repetitive phrases. Then, I tried truncating the generated samples as @fabrahman suggested, but the output was the same. So I think that the current version of the code does not generate the EOS token. Can you please check if the repo is up to date with the code you used for the paper? Thanks!

Does the generated text include the token '<|endoftext|>' after rewritten story (and before the gibberish)? Or could you post a couple of generated samples here? Thanks!

khushboo-mehra commented 4 years ago

No, there's no '<|endoftext|>' in the output.

Below are some examples from the m_supervised setting (but this pattern occurs in outputs from other variants as well):

In other days she took to the popcorn factory. They took a tour of the factory. Molly had a great day. Molly decided to keep on eating popcorn. She was proud of herself. Molly decided to try popcorn popcorn. She was proud that she knew what it was. Molly decided to try popcorn again. She decided to keep on eating popcorn

While on his way to the hospital, Tim was in a car accident. The tow truck driver offered to drop off Tim at the hospital. Tim was very thankful. Tim was very thankful. Tim was very thankful. Tim was very thankful. Tim was very thankful

khushboo-mehra commented 4 years ago

Hi @qkaren Any updates on this issue? Thanks!

qkaren commented 4 years ago

Sorry for the delay. It's likely that this version of code has a bug so that the model does not see the '<|endoftext|>' token at training time. I'm still trying to find the latest code.

In the meantime, it'd be nice if you would try quickly some possible fixes, i.e., replacing here with len_x1x2y = len(x1x2y_ids) + 1, and here with len_x1x2yx1xx2yy = len(x1x2yx1xx2yy_ids)