Closed andr-ec closed 4 years ago
Thanks! Did you do anything to verify that this doesn't effect metrics or otherwise break?
I ran it on a smaller subset of the dataset and ran the interact script, it seemed to be fine. I'm currently training it on the full dataset. It'll take a while on my vm, but I'll report back once that finishes and I run the ConvAI2 evaluation scripts!
@sshleifer I trained with the parameters:
--model="openai-gpt" --device="cuda" --n_epochs=1
the evaluation script finished with
[ Finished evaluating tasks ['convai2:self'] using datatype valid ]
{'exs': 7801, 'hits@1': 0.75, 'hits@5': 0.961, 'hits@10': 0.992, 'hits@100': 1.0}
============================
FINAL Hits@1: 0.75
Do we need anything else to verify metrics?
Are those metrics ~equivalent to those in the preview here?
https://github.com/huggingface/transfer-learning-conv-ai/pull/30
Sorry I'm being so lazy, mostly focused on main repo :)
No worries, the tensorboard metrics are pretty close: results from your PR
{'accuracy': 0.7466991411357519,
'average_accuracy': 0.74669914113575a19,
'average_nll': 2.6821035040007972,
'average_ppl': 14.615805388160778,
'nll': 2.6821035040007972}
results from my PR
{'accuracy': 0.7496474981307983,
'average_accuracy': 0.7496474981307983,
'average_nll': 2.6389272212982178,
'average_ppl': 13.99817943572998,
'nll': 2.6389272212982178}
👍
@acarrera94 were you able to test these changes with gpt2
? Specifically, did you try testing interact.py
with gpt2
?
When I try to interact with a model trained using gpt2
, I get the following error:
logits = logits[0, -1, :] / args.temperature
IndexError: too many indices for tensor of dimension 2
I think the logits = logits[0]
fix is not applicable with the latest transformers and is just an artifact of pytorch_pretrained_bert, and hence can be removed.
@acarrera94 @julien-c @thomwolf
Good question, I didn’t have enough memory to train gpt2 on my vm, but if you have a checkpoint I can download id be happy to take a look at it
@acarrera94 you don't need to be able to train GPT-2 to figure out that this is a bug.
Do something like this in your Python interpreter:
>>> import pytorch_pretrained_bert
>>> from pytorch_pretrained_bert import GPT2LMHeadModel
>>> print(pytorch_pretrained_bert.__version__)
0.6.2
>>> model = GPT2LMHeadModel.from_pretrained("gpt2")
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> input = "hello how are you"
>>> input_ids = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(input))
>>> import torch
>>> input_ids = torch.tensor(input_ids)
>>> input_ids = input_ids.view(1,-1)
>>> outputs = model(input_ids)
>>> outputs[0].shape
torch.Size([1, 4, 50257])
>>> import transformers
>>> from transformers import GPT2LMHeadModel
>>> print(transformers.__version__)
2.3.0
>>> model = GPT2LMHeadModel.from_pretrained("gpt2")
>>> outputs = model(input_ids)
>>> outputs[0].shape
torch.Size([1, 4, 50257])
Clearly, the logits is a 3-dimensional tensor for gpt2
. So the logits = logits[0]
fix won't work for gpt2
, even with pytorch_pretrained_bert 0.6.2.
cc @sshleifer since I think I remember you mentioning in one of your past pull requests that you were unable to interact with GPT-2 in the code as it currently stands -- UPDATE: I found it, this was the one where you mentioned this: https://github.com/huggingface/transfer-learning-conv-ai/pull/29
also cc @KasparPeterson since you were the one who introduced this fix in the first place in this pull request: https://github.com/huggingface/transfer-learning-conv-ai/pull/6
I think some changes were made to pytorch_pretrained_bert 0.6.2
after @KasparPeterson introduced this fix, rendering it unnecessary. Basically the logits used to be a 4-dimensional tensor for gpt2
before, hence making this fix necessary. @thomwolf can confirm since he approved Kaspar's pull request here.
updated dependencies and imports to use
transformers
. updatedignore_index
and ignored values in tensors to be -100, same as default intransformers
andpytorch
. updated checkpoint save to use correct path #60