PET results different from reported in huggingface blog "How many data points is a prompt worth?" study

timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

https://arxiv.org/abs/2001.07676

Apache License 2.0

1.62k stars 282 forks source link

PET results different from reported in huggingface blog "How many data points is a prompt worth?" study #69

Open luffycodes opened 2 years ago

luffycodes commented 2 years ago

For MNLI, on the blog https://huggingface.co/blog/how_many_data_points/ - reported accuracy is 0.83 for 1000 data samples.

In the paper (https://arxiv.org/pdf/2001.07676.pdf), (table 1), for MNLI, accuracy reported is 0.85 for 1000 data samples.

I was wondering how the accuracy is reported in the PET paper.

timoschick commented 2 years ago

Hi @luffycodes, the accuracy reported in the PET paper is exactly what you obtain using this library. You can check out details about the "How many data points is a prompt worth?" study in their paper - one important difference to our experiments is that they

[...] run every experiment 4 times in order to reduce variance,

Also, I would assume that they have used a different random selection of 1,000 training examples (but to verify this, you should reach out to the authors directly).