timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Imbalanced data #17

Closed hubert98 closed 3 years ago

hubert98 commented 3 years ago

I would like to use pet for imbalanced data (Class1 ~80% of data, Class 2-5 ~20% of data) Do you have experience with or training tips for using PET with imbalanced classes?

timoschick commented 3 years ago

Hi @hubert98, the only imbalanced dataset we've used PET for so far is CommitmentBank, which is part of SuperGLUE. For this dataset, we found that iterative PET (iPET) greatly improves performance - so if you have enough unlabeled data, this is something I would definitely recommend trying. In iPET, the distribution of classes for new training sets is adopted from the original training set's distribution. So if for some reason you happen to know that the class distribution of your training data is different from the (expected) distribution of your test data, you may need to change this manually. Apart from that, I don't have any tips for using PET with imbalanced classes as my experience with PET on such datasets is limited.

hubert98 commented 3 years ago

Thank you very much for the feedback!