Closed La-SilverLand closed 3 years ago
Hi, here are my answers:
have you scaled up the training data and compare with BERT accordingly ? ASN> No, I have not. But now I am working on scaling up the model size first. Using more training data is our next concern.
bert and other transformer-based models often run the GLUE task suit, but your paper does not include this part, what is your consideration ? ANS> This is because the main consideration of our paper is on unsupervised learning tasks such as N-best list re-ranking (but, as you know, GLUE tasks are supervised tasks).
For your information, I am going to run the GLUE task after enhancing the model. For now, 3-layer TTA is not good for fine-tuning on the downstream tasks.
Thanks for your interest, and I'd be happy to answer any more follow-up questions.
thanks, i'll keep an eye on your next step work : )
Hi, i've got 2 questions 1) have you scaled up the training data and compare with BERT accordingly ? 2) bert and other transformer-based models often run the GLUE task suit, but your paper does not include this part, what is your consideration ?