huggingface / transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning
MIT License
1.73k stars 431 forks source link

Update from pytorch-transformers to transformers library #61

Closed andr-ec closed 4 years ago

andr-ec commented 4 years ago

updated dependencies and imports to use transformers. updated ignore_index and ignored values in tensors to be -100, same as default in transformers and pytorch. updated checkpoint save to use correct path #60

sshleifer commented 4 years ago

Thanks! Did you do anything to verify that this doesn't effect metrics or otherwise break?

andr-ec commented 4 years ago

I ran it on a smaller subset of the dataset and ran the interact script, it seemed to be fine. I'm currently training it on the full dataset. It'll take a while on my vm, but I'll report back once that finishes and I run the ConvAI2 evaluation scripts!

andr-ec commented 4 years ago

@sshleifer I trained with the parameters: --model="openai-gpt" --device="cuda" --n_epochs=1 the evaluation script finished with

[ Finished evaluating tasks ['convai2:self'] using datatype valid ]
{'exs': 7801, 'hits@1': 0.75, 'hits@5': 0.961, 'hits@10': 0.992, 'hits@100': 1.0}
============================
FINAL Hits@1: 0.75
andr-ec commented 4 years ago

Do we need anything else to verify metrics?

sshleifer commented 4 years ago

Are those metrics ~equivalent to those in the preview here?

https://github.com/huggingface/transfer-learning-conv-ai/pull/30

Sorry I'm being so lazy, mostly focused on main repo :)

andr-ec commented 4 years ago

No worries, the tensorboard metrics are pretty close: results from your PR

{'accuracy': 0.7466991411357519,
'average_accuracy': 0.74669914113575a19,
'average_nll': 2.6821035040007972,
'average_ppl': 14.615805388160778,
'nll': 2.6821035040007972}

results from my PR

{'accuracy': 0.7496474981307983,
'average_accuracy': 0.7496474981307983,
'average_nll': 2.6389272212982178,
'average_ppl': 13.99817943572998,
'nll': 2.6389272212982178}
julien-c commented 4 years ago

👍

g-karthik commented 4 years ago

@acarrera94 were you able to test these changes with gpt2? Specifically, did you try testing interact.py with gpt2?

g-karthik commented 4 years ago

When I try to interact with a model trained using gpt2, I get the following error:

    logits = logits[0, -1, :] / args.temperature
IndexError: too many indices for tensor of dimension 2

I think the logits = logits[0] fix is not applicable with the latest transformers and is just an artifact of pytorch_pretrained_bert, and hence can be removed.

@acarrera94 @julien-c @thomwolf

andr-ec commented 4 years ago

Good question, I didn’t have enough memory to train gpt2 on my vm, but if you have a checkpoint I can download id be happy to take a look at it

g-karthik commented 4 years ago

@acarrera94 you don't need to be able to train GPT-2 to figure out that this is a bug.

Do something like this in your Python interpreter:

>>> import pytorch_pretrained_bert
>>> from pytorch_pretrained_bert import GPT2LMHeadModel
>>> print(pytorch_pretrained_bert.__version__)
0.6.2
>>> model = GPT2LMHeadModel.from_pretrained("gpt2")
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> input = "hello how are you"
>>> input_ids = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(input))
>>> import torch
>>> input_ids = torch.tensor(input_ids)
>>> input_ids = input_ids.view(1,-1)
>>> outputs = model(input_ids)
>>> outputs[0].shape
torch.Size([1, 4, 50257])
>>> import transformers
>>> from transformers import GPT2LMHeadModel
>>> print(transformers.__version__)
2.3.0
>>> model = GPT2LMHeadModel.from_pretrained("gpt2")
>>> outputs = model(input_ids)
>>> outputs[0].shape
torch.Size([1, 4, 50257])

Clearly, the logits is a 3-dimensional tensor for gpt2. So the logits = logits[0] fix won't work for gpt2, even with pytorch_pretrained_bert 0.6.2.

cc @sshleifer since I think I remember you mentioning in one of your past pull requests that you were unable to interact with GPT-2 in the code as it currently stands -- UPDATE: I found it, this was the one where you mentioned this: https://github.com/huggingface/transfer-learning-conv-ai/pull/29

also cc @KasparPeterson since you were the one who introduced this fix in the first place in this pull request: https://github.com/huggingface/transfer-learning-conv-ai/pull/6

I think some changes were made to pytorch_pretrained_bert 0.6.2 after @KasparPeterson introduced this fix, rendering it unnecessary. Basically the logits used to be a 4-dimensional tensor for gpt2 before, hence making this fix necessary. @thomwolf can confirm since he approved Kaspar's pull request here.