huggingface / transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning
MIT License
1.73k stars 431 forks source link

Parameter of forward pass (missing?) #56

Closed lukasfrank closed 4 years ago

lukasfrank commented 4 years ago

I'm studying the model and was wondering about a few things:

  1. the dataloader append the padding tokens and while the training there is no attention_mask set. is there any reason for that?
  2. In your blog post you describe that the model should care about the positions of the tokens. Is there any reason why the position_ids parameter is not set?
  3. lm_labels parameter: according to the documentation labels are ignored if they are set to -100. Is that a type in the documentation or did I miss something?
sshleifer commented 4 years ago
  1. Don't know.
  2. position_ids are inferred by model.forward if position_ids=None (the default) is passed.
  3. This code also uses a slightly older version of the repo than the docs you linked. If you read the docstring for pytorch-transformers (the older version) it says "All labels set to -1 are ignored (masked)".
silverriver commented 3 years ago

@lukasfrank For your first question:

All the attention in transfertransfo is masked self attention, that means all the future tokens are masked. Therefore all the padding tokens are masked when applying the future mask, and these padding tokens will not be involved in the attention calculation process.