wietsedv / bertje

BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models"
https://aclanthology.org/2020.findings-emnlp.389/
Apache License 2.0
135 stars 10 forks source link

Reference to RobBERT #2

Closed peter-vandenabeele-axa closed 4 years ago

peter-vandenabeele-axa commented 4 years ago

This reference shows a comparison with RobBERT:

https://arxiv.org/pdf/2001.06286.pdf

What are the advantages of "bertje" (maybe it is smaller/simpler/cheaper to run) ?

FYI, there is a discussion on LinkedIn (sign-up required) at:

https://www.linkedin.com/feed/update/urn:li:activity:6631077105952178176?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6631077105952178176%2C6631183002305077250%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A6631077105952178176%2C6631187459520675841%29

wietsedv commented 4 years ago

Your question implies that RobBERT performs "better" than BERTje. You cannot really conclude that based on their results. The benchmark results in the README of this repository show that BERTje outperforms RobBERT for other tasks. I am not saying that BERTje performs best for every task or will be the best choice at all, but just that you have to have to be critical about who is doing the evaluation and what experiments are actually done to evaluate the model. In case of doubt, you could always try multiple models.

The RobBERT paper shows results for two tasks. For the sentiment analysis task, they do not really compare model performance but rather method performance since they have not fine-tuned BERTje using their method. It is possible that their method with BERTje as the base model will result in higher performance. And I cannot say anything about the die/dat task other than that this task feels a little cherry-picked to me. And being able to distinguish "die" and "dat" does not really say very much about general language understanding.

Also, regardless of downstream task performance, the tokenisitation vocabulary of BERTje is objectively better since BERTje uses a vocabulary that is created specifically for Dutch (https://s3.amazonaws.com/models.huggingface.co/bert/wietsedv/bert-base-dutch-cased/vocab.txt) whereas RobBERT uses the English RobBERTa vocabulary (https://s3.amazonaws.com/models.huggingface.co/bert/pdelobelle/robBERT-base/vocab.json).

peter-vandenabeele-axa commented 4 years ago

Thanks. I have coupled back this reply to the discussion thread on LinkedIn.

If we see no further discussion here, we could close this issue. Or maybe add a note in the general README, as this question will come up every now and then when users will want to use optimal models for Dutch language texts.

Also, maybe you could comment on my (naive) question on complexity and cost. There is not just the precision that is relevant, but also the cost, complexity and time to get to those results ...

wietsedv commented 4 years ago

Thanks, I will close this issue. Aspects like cost, complexity, size and so on are more or less equivalent between these models as both are based on a 12-layer transformer encoder. These models can also be interchangably without any difficulty if you use the Transformers libraray by Huggingface.