The per-task performance in Table 2 for our approach is
cola
sst2
mrpc
qqp
mnli-m
mnli-mm
qnli
rte
avg
55.5
94.72
87.25
89.47
85.97
86.31
91.93
69.68
82.09857143
We follow the Y-tuning paper to report the best result out of different learning rates (3e-4, 1e-3, 3e-3) and three seeds. The best learning rate for QQP and MNLI is 3e-4 and for the rest of the tasks is 1e-3.
Hi,
in Table 2 you used BART-large encoder. Is it possible for you to release the score of each individual GLUE task instead of the average one?