question: Is the model used in the policy network always distilgpt?

mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper

MIT License

286 stars 52 forks source link

question: Is the model used in the policy network always distilgpt? #36

Closed pascalhuszar closed 8 months ago

pascalhuszar commented 9 months ago

In the paper you first state “The policy LM need not be the same as the LM we optimize the prompt for (i.e., task LM).” In section Few Shot Text Classification you mention using RoBERTa-large as the "backbone model" but then in the appendix you stating "For all tasks, we uniformly use distilGPT-2" I'm a bit confused. Are policy network model and task model (for which the prompt is optimized) the same? Am I right that for the task of text classification both, BERT and GPT could be utilized?

MM-IR commented 8 months ago

We have the backbone interested LMs to be prompted, and one prompt generation LM (i.e., distilgpt2).

Backbone LM (usually large LMs): The model at which you are interested for prompting. For instance, RoBERTa-large is referring to the downstream LM, whose main utility in this RL paradigm is to set up the correct rewards during prompt optimization.
Prompt Generation LM: We adopt distilgpt2 now, since it is much smaller, leading to much efficiency in our algorithm.

MM-IR commented 8 months ago

You need to further improve your understanding of our method formulation, a prompt generation framework for prompt optimization.

indirected commented 3 months ago

Sorry for reopening this, but I am a bit confused as to what model is being used for what. Per my understanding from the code, the model generating the prompts is always a flavor of gpt-2. However, in the paper, figures 4 and 7 (the heatmaps) captions' states:

The columns represent the models used to learn the prompts, and the rows represent the models we perform classification with. Brighter color represents higher accuracy.

Also, there are RoBERTa models in the heatmap. does this mean the RoBERTa was used to generate the prompt, OR while training the gpt-2 model to generate the prompts, the downstream LM was RoBERTa, and then while testing, the model was changed to something else to test for transferability?