Closed speedcell4 closed 3 years ago
Hi, thanks for the coming soon source code. I have two questions about the sequence length dynamic adjustment.
I got the point that you use two consecutive
[eos]
s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single[eos]
, e.g.,I ate an [eos] apple [eos] [eos]
, and you need to remove all these intermediate[eos]
s, is this correct?
- If this is true, then why do you need two
[eos]
s instead of a single[eos]
? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos]
,[eos]
)", so the point here is to make[eos]
a black hole? Once decoding trajectory transits to[eos]
, it will not have a chance to get out? If this is correct, then why not simply set all[eos] -> non-[eos]
transitions very negative weights and do not update them during training?At the training stage, say the target sequence is
I ate an apple
and the length of the source sequence is 9, which of the following do you use to train the model as the target?
I ate an apple [eos] [eos]
I ate an apple [eos] [eos] [eos] [eos] [eos]
Hope I can get your reply, and thanks~
Friend, I also have this question. Have you figured out this question later? I need your help. Recently, there was a paper in 2021acl that used this method to do a task of correcting Chinese grammar. He was even more outrageous. The length of the target sentence was known by default during the test, and I was completely confused.
@clearloveclearlove No, I am still waiting for the reply from the authors. By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"
@clearloveclearlove No, I am still waiting for the reply from the authors. By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"
i mean the paper <
@clearloveclearlove No, I am still waiting for the reply from the authors. By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"
i mean the paper <>. same architecture bert+crf for Grammatical Error Correction. the code supply by the author when he test, he direct use the information of target length, so ...
Tail-to-Tail Non-Autoregressive Sequence Prediction for ChineseGrammatical Error Correction
Hi, thanks for the coming soon source code. I have two questions about the sequence length dynamic adjustment.
I got the point that you use two consecutive
[eos]
s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single[eos]
, e.g.,I ate an [eos] apple [eos] [eos]
, and you need to remove all these intermediate[eos]
s, is this correct?
- If this is true, then why do you need two
[eos]
s instead of a single[eos]
? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos]
,[eos]
)", so the point here is to make[eos]
a black hole? Once decoding trajectory transits to[eos]
, it will not have a chance to get out? If this is correct, then why not simply set all[eos] -> non-[eos]
transitions very negative weights and do not update them during training?At the training stage, say the target sequence is
I ate an apple
and the length of the source sequence is 9, which of the following do you use to train the model as the target?
I ate an apple [eos] [eos]
I ate an apple [eos] [eos] [eos] [eos] [eos]
Hope I can get your reply, and thanks~
Hello, sorry for my very late reply... During training, we use this configuration I ate an apple [eos] [eos]
. Because we found that if we append many [eos] tokens as I ate an apple [eos] [eos] [eos] [eos] [eos]
, the model parameters will be overwhelmed by the occurance of [eos] token and it only learns to generate [eos] token as well. In practice, the generation of sequences like 'I ate an [eos] apple [eos] [eos]' are possible, but putting two [eos] tokens in training could reduce this phenomenon. Feel free to ask follow up questions! Sorry for my late reply again.
@clearloveclearlove No, I am still waiting for the reply from the authors. By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"
i mean the paper <>. same architecture bert+crf for Grammatical Error Correction. the code supply by the author when he test, he direct use the information of target length, so ...
I know this paper too. The paper is from one of co-author of NAG-BERT. I did not look into details of his paper. I can help you ask him about the details if you need.
Thanks~
Hi, thanks for the coming soon source code. I have two questions about the sequence length dynamic adjustment.
[eos]
s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single[eos]
, e.g.,I ate an [eos] apple [eos] [eos]
, and you need to remove all these intermediate[eos]
s, is this correct?[eos]
s instead of a single[eos]
? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos]
,[eos]
)", so the point here is to make[eos]
a black hole? Once decoding trajectory transits to[eos]
, it will not have a chance to get out? If this is correct, then why not simply set all[eos] -> non-[eos]
transitions very negative weights and do not update them during training?I ate an apple
and the length of the source sequence is 9, which of the following do you use to train the model as the target?I ate an apple [eos] [eos]
I ate an apple [eos] [eos] [eos] [eos] [eos]
Hope I can get your reply, and thanks~