mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper
MIT License
286 stars 52 forks source link

Could RLPROMPT be applied to zero-shot settings? #9

Closed shadowkiller33 closed 1 year ago

shadowkiller33 commented 1 year ago

Could you please talk about the performance on zero-shot setting?

mingkaid commented 1 year ago

Thank you for your interest. Even without supervised data (i.e., zero-shot setting), our framework can learn prompts to achieve good performance using weak supervision signals as the reward function.

For example, we applied our framework to unsupervised text style transfer, and achieve superior or competitive performance compared to a variety of training and prompting baselines (please see the screenshot below, which shows Table 4 of our ArXiv PDF as of Sep 2nd, 2022)

image

Let us know if you have other questions. I'm closing this issue now because it's a clarification question.

shadowkiller33 commented 1 year ago

Thanks for your reply, I can see the zero-shot performance on text style transfer. Sorry for my unclear clarification, I'm curious about the zero-shot performane on text classification, which seems to be illustrated in the paper.

mingkaid commented 1 year ago

I'm not sure where we illustrated "zero-shot performance on text classification". Could you point to any specific section or figure?

In general, prompts learned from our method does transfer well across related datasets. For instance, we provided a learned prompt Absolutely VERY absolute VERY absolute in examples/few-shot-classification/README.md. This prompt was learned from the SST-2 dataset, but can transfer well to other binary sentiment classification datasets without seeing more training examples, e.g., see our test performance below:

Dataset Accuracy (%) Best Baseline (Accuracy %)
sst-2 92.7 89.1
yelp-2 95.0 93.2
mr 88.0 86.6
cr 89.6 87.4

Let us know if you have other questions.

MM-IR commented 1 year ago

Besides, for this type of transferred prompt, our framework supports training with hand-written examples, e.g. the same setting as 16-shot. Also, in our preliminary experiments with 1-shot, we do find several promising prompts (may not be better than 16-shot) and biased prompts (more like overfit to training examples).