Open rshaojimmy opened 2 years ago
while <pad> in array:
remove <pad> from array
remove <eos> from array
Thanks for your quick reply!
But if I remove eos from array, how can model learn to stop generating sentence without encountering the eos token?
Model itself will predict the eos token.
If the model doesn't predict eos token, and the entire sentence is gibberish, then the model isn't generalized well or data is insufficient.
On Wed, 2 Feb, 2022, 7:38 am Rui Shao, @.***> wrote:
Thanks for your quick reply!
But if I remove from array, how can model learn to stop generating sentence without encountering the token?
— Reply to this email directly, view it on GitHub https://github.com/saahiluppal/catr/issues/23#issuecomment-1027507088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ7DKFWWHLASC3EPZFGGODUZCG2XANCNFSM5NJLJS4Q . You are receiving this because you commented.Message ID: @.***>
But we should let trg[:-1] have eos token when we calculate the loss, right? like this: trg[:-1] = [x_1, x_2, x_3, eos, pad, pad] or trg[:-1] = [x_1, x_2, x_3, eos]
Depends on your training dataset.
If your dataset have special tokens like
While
On Thu, 3 Feb, 2022, 7:47 am Rui Shao, @.***> wrote:
But we should let trg[:-1] have eos token when we calculate the loss, right? like this: trg[:-1] = [sos, x_1, x_2, x_3, eos, pad, pad] or trg[:-1] = [sos, x_1, x_2, x_3, eos]
— Reply to this email directly, view it on GitHub https://github.com/saahiluppal/catr/issues/23#issuecomment-1028540171, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ7DKBJ734ASWNV7LUJFFLUZHQVPANCNFSM5NJLJS4Q . You are receiving this because you commented.Message ID: @.***>
Thanks.
In all, I just want to create a dataset with sequences of different lengths. In such a dataset, I insert bos, eos in into the beginning and end of each sequence as the ground-truth. like this:
caps = [sos, x_1, x_2, x_3, eos]
In such a case,
caps[:, :-1] = [sos, x_1, x_2, x_3]
caps[:, 1:] = [x_1, x_2, x_3, eos]
This is what we want for the loss calculation.
outputs = model(samples, caps[:, :-1], cap_masks[:, :-1])
loss = criterion(outputs.permute(0, 2, 1), caps[:, 1:])
However, given different lengths, I have to further insert pad tokens to make them consistent, such as:
caps = [sos, x_1, x_2, x_3, eos, pad, pad, pad]
In such case,
caps[:, :-1] = [sos, x_1, x_2, x_3, eos, pad, pad]
caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad]
The input of model (caps[:, :-1]) will contain the eos token, which we want to remove.
Considering this, I just further replace the eos token with pad token as pad token will not be calculated for the loss, like this:
caps[:, :-1] = [sos, x_1, x_2, x_3, pad, pad, pad]
And I remain the caps[:, 1:] as
caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad].
May I ask does this make sense?
you should consider eos token in loss. Because you want your model to learn when to stop generating a sentence.
On Thu, 3 Feb, 2022, 1:03 pm Rui Shao, @.***> wrote:
Thanks.
In all, I just want to create a dataset with sequences of different lengths. In such a dataset, I insert bos, eos in into the beginning and end of each sequence as the ground-truth. like this:
caps = [sos, x_1, x_2, x_3, eos]
In such a case,
caps[:, :-1] = [sos, x_1, x_2, x_3] caps[:, 1:] = [x_1, x_2, x_3, eos]
This is what we want for the loss calculation.
outputs = model(samples, caps[:, :-1], cap_masks[:, :-1]) loss = criterion(outputs.permute(0, 2, 1), caps[:, 1:])
However, given different lengths, I have to further insert pad tokens to make them consistent, such as:
caps = [sos, x_1, x_2, x_3, eos, pad, pad, pad]
In such case,
caps[:, :-1] = [sos, x_1, x_2, x_3, eos, pad, pad] caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad]
The input of model (caps[:, :-1]) will contain the eos token, which we want to remove.
Considering this, I just further replace the eos token with pad token as pad token will not be calculated for the loss, like this:
caps[:, :-1] = [sos, x_1, x_2, x_3, pad, pad, pad]
And I remain the caps[:, 1:] as caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad].
May I ask does this make sense?
— Reply to this email directly, view it on GitHub https://github.com/saahiluppal/catr/issues/23#issuecomment-1028683489, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ7DKAEX266V3MZ7V4YBDTUZIVTTANCNFSM5NJLJS4Q . You are receiving this because you commented.Message ID: @.***>
As I want the model to predict the end token by excluding it from the input into the model, I simply slice the token off the end of the sequence. Thus:
trg = [sos, x_1, x_2, x_3, eos] trg[:-1] = [sos, x_1, x_2, x_3]
This is also same as your implementation.
But actually many datasets collect sentences with different length, ans thus the last elements of sentences are tokens, such as:
trg = [sos, x_1, x_2, x_3, eos, pad, pad, pad] trg[:-1] = [sos, x_1, x_2, x_3, eos, pad, pad]
In such a case, I can’t slice the token, may I ask how can I solve this issue?