Open aparna-aketi opened 1 month ago
Hi,
Here "prompt" just means to prompt-based fine-tuning (https://arxiv.org/abs/2012.15723), a very standard way to fine-tuning language models nowadays.
Hi, Thanks for the response. Just for clarification, in figure 2 of the MeZO paper, does FT correspond to full fine-tuning or prompt-based fine-tuning? I want to reproduce the results corresponding to that figure.
Hi, everything we report is prompt-based fine-tuning, since that provides much better performance
Okay, thanks for clarification. One more question: mezo.sh file has the steps to be 100k and run_fewshot.sh has 1000 steps. In the figure 2, is MeZO run for 100x more steps than FT. Is that correct? It doesn't seem like a fair comparison as MeZO uses 100x more number of steps than FT. Even if we consider the backward pass to be 2x more expensive than forward pass, we should be using 3x steps for MeZO to do a fair comparison with FT. It would be great if you could provide some insights here.
Hi,
Yes MeZO is run with 100x more steps than FT. It is not a fair comparison in terms of wall clock time. The Roberta-large experiments are mainly to showcase that it is possible to train models without backpropagation (which saves a lot of memory).
I want to run a full fine-tuning with RoBERTa-large. The readme file suggest to use following command
However, the type parameter is set to
TYPE:-"prompt"
. Shouldn't this be set to"finetune"
?