jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.78k stars 1.97k forks source link

In patch_trg, i cant understand why do you change the data shape like that #205

Open kwanhoP opened 1 year ago

kwanhoP commented 1 year ago

my dataset is composed with horizontal

i didn't use transpose(0,1)

so i changed your code like below

def patch_trg(trg, pad_idx):
       trg , gold = trg[:, :-1], trg[:, 1:].contiguous().view(-1)
       return trg, gold

And my dataset example is composed with below sample_1 = bos, 346, 32, 124, 214, eos sample_2 = bos, 346, 124, 214, eos ... sample_N = bos, 346, 32, 32, 32, 124, 214, eos

every length of sample data is different

so, this is my question. if i running your code, when making trg parameter, the eos token of longest sample is deleted that means, in every batch, the longest sample will be trained without eos token

so i want to know the correct role of that code(trg[:, :-1] and trg[:, 1:])?

i think that gold made for to get rid of bos token but i dont know the trg parameter

Gi-gigi commented 1 year ago

Hi bro, how did you get the program to work? The dataset doesn't download, the preprocess.py file doesn't work.

kwanhoP commented 1 year ago

@Gi-gigi actually, i didn't use the dataset that @jadore801120 prepare. i just use my datset and i have to change to something in preprocess.py. easily i customize the transformer code.

TIanCat commented 1 year ago

Perhaps I can answer your question. The ‘trg’ will be used as the input of decoder, and the decoder will predict the next word of known information. The 'gold' will be used as the label of predicted word. Let me give you an example. sample_1 = trg: bos, 346, 32, 124, 214 gold: 346, 32, 124, 214, eos

So, the 'eos' in the trg is meningless, and the loss function does not include it.