huggingface / transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning
MIT License
1.73k stars 431 forks source link

Questions about ppl when using gpt2 #63

Open ssxy00 opened 4 years ago

ssxy00 commented 4 years ago

Hi! I meet some problems when running ConvAI2 evaluation scripts:

I first trained a model from OpenAI GPT. I increased the number of cumulative gradients because I only have one card.

python train.py --model_checkpoint /path/to/pretrained/gpt \
--gradient_accumulation_steps=32 --lm_coef=2.0 --max_history=2 \
--n_epochs=1 --num_candidates=4 --personality_permutations=2 \
--train_batch_size=2 --valid_batch_size=2

this gives ConvAI2 evalution results:

Final Hits@1: 0.761
FINAL F1: 0.1659
FINAL PPL: 20.7

Then I tried to train from GPT2-small with the same config:

python train.py --model_checkpoint /path/to/pretrained/gpt2 \
--gradient_accumulation_steps=32 --lm_coef=2.0 --max_history=2 \
--n_epochs=1 --num_candidates=4 --personality_permutations=2 \
--train_batch_size=2 --valid_batch_size=2

and the evaluation results are:

Final Hits@1: 0.737
FINAL F1: 0.1643
FINAL PPL: 178.9

The command I used to run convai_evalution.py is:

python convai_evaluation.py --eval_type ppl --model_checkpoint /path/to/finetuned/model

The ppl of GPT2 is strangely high.

Is there anything that needs to be modified when testing finetuned-gpt2 with convai_evalution.py?

I'm also curious about the best test results and hyperparameters when you finetuned from GPT2. Thank you!

seyos11 commented 2 years ago

Did you find out how to achieve the best results? I have the same problem with GPT-2, which drives to a final ppl of 133.