Open XuhuiZhou opened 5 years ago
XLNet-Large has the same number of parameters as BERT-Large, while XLNet-Base has the same number of parameters as BERT-Base. I haven't looked at your code, though.
Maybe there are problems with huggingface's implementation of pytorch-transformers. I saw similar issues in their repository, so maybe you could refer to that.
Hi, I am using Xlnet as a language model with code provided by HuggingFace PyTorch-transformers. However, the xlnet consistantly underperformed Bert in our experiment. Considering it's advanced design, we are curious how could that happen. For example, we test their ability of coreference resolution on Winograd Schema Challenge dataset. An example of the dataset would be:
And we let the model to choose the corret one by calculating the perplexity of the sentence. In the end, we got the result: Bert-large: 62% acc Xlnet-base: 54.4% acc Xlnet-large: 63.6% acc So from my point of view, Xlnet-base should be compared to Bert-large since they have similar parameter size. Furthermore, we have done experiments on other test datasets, like SWAG, and saw the same phenomenon. Any thoughts on this problem would be appreciated :)
Code: