Closed Tsingularity closed 2 years ago
Hi In the ICLR paper, they have used BART model which is perhaps more suited for text generation, versus we have used the original T5 model. Additionally, the train/validation/test splits are different and these numbers are not comparable. We have provided the scripts to reproduce our results in the experiments.
P-tuning v2 uses prefix-tuning method (Li and Liang, 2021) where they add prompt tokens to each transformer layer. In contrast our results are based on prompt-tuning paper, where we add prompts to the input tokens only. These two methods are different.
i see. Thanks for the detailed explanation!
Thanks for the great work!
I am just wondering why the score numbers of prompt tuning in Table 1 are much less than the ones in other papers? For instance, on MNLI and SST-2, your prompt tuning numbers are 81 and 90, but in this recently published ICLR paper table 2, the numbers are 86 and 94, which are very close to the fully fine-tuning numbers. Another paper we can use for cross-validating prompt-tuning GLUE scores is p-tuning v2. And looks like their numbers are also higher than the ones in your table 1.
So I am just wondering, is there any chance the prompt-tuning's performance is under-estimated here?