joongbo / tta

Repository for the paper "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning"
Apache License 2.0
108 stars 20 forks source link

Scale up performance comparison and GLUE task #2

Closed La-SilverLand closed 3 years ago

La-SilverLand commented 3 years ago

Hi, i've got 2 questions 1) have you scaled up the training data and compare with BERT accordingly ? 2) bert and other transformer-based models often run the GLUE task suit, but your paper does not include this part, what is your consideration ?

joongbo commented 3 years ago

Hi, here are my answers:

  1. have you scaled up the training data and compare with BERT accordingly ? ASN> No, I have not. But now I am working on scaling up the model size first. Using more training data is our next concern.

  2. bert and other transformer-based models often run the GLUE task suit, but your paper does not include this part, what is your consideration ? ANS> This is because the main consideration of our paper is on unsupervised learning tasks such as N-best list re-ranking (but, as you know, GLUE tasks are supervised tasks).

For your information, I am going to run the GLUE task after enhancing the model. For now, 3-layer TTA is not good for fine-tuning on the downstream tasks.

Thanks for your interest, and I'd be happy to answer any more follow-up questions.

La-SilverLand commented 3 years ago

thanks, i'll keep an eye on your next step work : )