Open GeondoPark opened 3 years ago
Hi, Geondo.
The performance on RTE is unstable due to the small data set, so we just finetuned a few more times.
For SQuAD 2.0, we found that performing the intermediate layer distillation on the augmented task data set for more epochs is helpful.
Thank you for your response.
I wondering which part you finetuned for improving the performance of RTE. (learning rate, epochs, temperature, intermediate layer coefficient, and so on..! )
@XiaoqiJiao Regarding the squad_v2 dataset, can you elaborate how the augmentation was done? were both context and questions augmented? did you filter out inputs where the question no longer appears in the context? did you use 20x times data as well?
Thanks in advance
Hi, huawei-noah team.
Thank you for sharing the code of your interesting work, TinyBERT. I wonder which factors resulted in the performance improvement on the RTE and SQuAD 2.0 datasets, comparing the previous and recent versions.
Thanks.