huggingface / pytorch-openai-transformer-lm

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
MIT License
1.51k stars 282 forks source link

Can someone explain this line? #21

Open teucer opened 6 years ago

teucer commented 6 years ago

If my understanding is correct this is finding the places where there is delimiter and filters for them. How does this help with training?

https://github.com/huggingface/pytorch-openai-transformer-lm/blob/253ca422bbf94b19da2a4aa8f1b294e01ab8be37/model_pytorch.py#L207

rodgzilla commented 6 years ago

When the information reaches the classification head, it has one vector of dimension n_embd associated to each position of each input. If you want to get a single prediction for each input (as it is the case with classification tasks) you have to select one of these input.

As the transformer network is auto-regressive, the value you select has to be the rightmost one which corresponds to clf_token in the input as it is created like this:

x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
x13 = [start] + x1[:max_len] + [delimiter] + x3[:max_len] + [clf_token]
teucer commented 6 years ago

@rodgzilla Thank you a lot for the explanation. It makes a lot of sense! Out of curiosity, why all the values cannot be used?

thomwolf commented 6 years ago

Well for a classifier, we usually want a fixed length representation of the sentence so we can't really use a varying number of values. Starting from that, the last hidden state is the most logical summary of the sentence. But there are other possible options of course, feel free to try your ideas!

mehdimashayekhi commented 6 years ago

in original open ai code (https://github.com/openai/finetune-transformer-lm/blob/bd1cf7d678926041e6d19193cab7e5cd8ce2fce6/train.py#L191) in train.py in the model function here in this line clf_logits = clf(clf_h, 1, train=train), why ny is 1?, shouldn't it be 2? because we have two classes. is there a reason to use 1 and then later reshape the logits second dimension to 2?! I really appreciate your help,