mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper
MIT License
286 stars 52 forks source link

question: Is the model used in the policy network always distilgpt? #36

Closed pascalhuszar closed 8 months ago

pascalhuszar commented 9 months ago

In the paper you first state “The policy LM need not be the same as the LM we optimize the prompt for (i.e., task LM).” In section Few Shot Text Classification you mention using RoBERTa-large as the "backbone model" but then in the appendix you stating "For all tasks, we uniformly use distilGPT-2" I'm a bit confused. Are policy network model and task model (for which the prompt is optimized) the same? Am I right that for the task of text classification both, BERT and GPT could be utilized?

MM-IR commented 8 months ago

We have the backbone interested LMs to be prompted, and one prompt generation LM (i.e., distilgpt2).

MM-IR commented 8 months ago

You need to further improve your understanding of our method formulation, a prompt generation framework for prompt optimization.

indirected commented 3 months ago

Sorry for reopening this, but I am a bit confused as to what model is being used for what. Per my understanding from the code, the model generating the prompts is always a flavor of gpt-2. However, in the paper, figures 4 and 7 (the heatmaps) captions' states:

The columns represent the models used to learn the prompts, and the rows represent the models we perform classification with. Brighter color represents higher accuracy.

Also, there are RoBERTa models in the heatmap. does this mean the RoBERTa was used to generate the prompt, OR while training the gpt-2 model to generate the prompts, the downstream LM was RoBERTa, and then while testing, the model was changed to something else to test for transferability?