Closed pascalhuszar closed 8 months ago
We have the backbone interested LMs to be prompted, and one prompt generation LM (i.e., distilgpt2).
Backbone LM (usually large LMs): The model at which you are interested for prompting. For instance, RoBERTa-large is referring to the downstream LM, whose main utility in this RL paradigm is to set up the correct rewards during prompt optimization.
Prompt Generation LM: We adopt distilgpt2 now, since it is much smaller, leading to much efficiency in our algorithm.
You need to further improve your understanding of our method formulation, a prompt generation framework for prompt optimization.
Sorry for reopening this, but I am a bit confused as to what model is being used for what. Per my understanding from the code, the model generating the prompts is always a flavor of gpt-2. However, in the paper, figures 4 and 7 (the heatmaps) captions' states:
The columns represent the models used to learn the prompts, and the rows represent the models we perform classification with. Brighter color represents higher accuracy.
Also, there are RoBERTa models in the heatmap. does this mean the RoBERTa was used to generate the prompt, OR while training the gpt-2 model to generate the prompts, the downstream LM was RoBERTa, and then while testing, the model was changed to something else to test for transferability?
In the paper you first state “The policy LM need not be the same as the LM we optimize the prompt for (i.e., task LM).” In section Few Shot Text Classification you mention using RoBERTa-large as the "backbone model" but then in the appendix you stating "For all tasks, we uniformly use distilGPT-2" I'm a bit confused. Are policy network model and task model (for which the prompt is optimized) the same? Am I right that for the task of text classification both, BERT and GPT could be utilized?