timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Multi Token Label #20

Closed hubert98 closed 3 years ago

hubert98 commented 3 years ago

I am not sure if this is already found in the code for the identity function solution in your recent paper:

When I want to use multi-token labels - for example: 1) very good 2) good 3) ok 4) bad 5 very bad

very good / very bad are multi-token.

Can you give me a short (for dummys) idea how to implement train that?

timoschick commented 3 years ago

Hi @hubert98, making multi token labels more easy to use is something that I've been wanting to do for quite some while, but I haven't found the time to do so yet. Unfortunately, I'll be on vacation starting tomorrow until January 6th, but I'll try to provide you with an example as soon as I'm back. In the meantime, you may take a look at task_helpers.py (in particular, RecordTaskHelper might be of interest) but this is different from your use case in that the set of labels is different for each input example, and the current implementation is not really optimized with regards to readability.

hubert98 commented 3 years ago

Hi @timoschick - happy new year and hope you had a great time. I am a bit lost with the RecordTaskHelper - how would the labels (e.g. "very good") go in the respective PVP ?

Maybe you can give a short example for dummies (me)?

TZeng20 commented 3 years ago

Hi @hubert98 and @timoschick, I am also interested in this idea of using multiple tokens to represent a label. I think it was mentioned in the home page, in this paper from the same authors.

Please let me know if I have understood the idea correctly (based on section 3.1, equation 4). I believe the core idea is multiplying together the probability distributions of the [MASK] token if there are multiple [MASK] tokens.

Let x be the original text, P(x) be the cloze question, q = M(v(y)|x) be the logits generated by the MLM, M. If we use the same labels and verbalizations that you have provided:

P(x) = 'some text. In summary it was [MASK]'. P^2(x) = 'some text. In summary it was [MASK_1][MASK_2]'. (subscript 2 means there are 2 mask tokens) q(good|x) = M(good|P^2(x)) q(very good|x) = M(very|P^2(x)) \times M(good|P^2(x))

timoschick commented 3 years ago

Hi @hubert98 and @TZeng20, I have now added a way of using multiple masks that does not require you to manually define a custom TaskHelper. You can find all relevant information here: https://github.com/timoschick/pet#pet-with-multiple-masks I didn't have the time to test this feature thorougly, so please let me know if anything doesn't work as expected or if you have further questions by raising a new issue.