Closed peter-vandenabeele-axa closed 4 years ago
Your question implies that RobBERT performs "better" than BERTje. You cannot really conclude that based on their results. The benchmark results in the README of this repository show that BERTje outperforms RobBERT for other tasks. I am not saying that BERTje performs best for every task or will be the best choice at all, but just that you have to have to be critical about who is doing the evaluation and what experiments are actually done to evaluate the model. In case of doubt, you could always try multiple models.
The RobBERT paper shows results for two tasks. For the sentiment analysis task, they do not really compare model performance but rather method performance since they have not fine-tuned BERTje using their method. It is possible that their method with BERTje as the base model will result in higher performance. And I cannot say anything about the die/dat task other than that this task feels a little cherry-picked to me. And being able to distinguish "die" and "dat" does not really say very much about general language understanding.
Also, regardless of downstream task performance, the tokenisitation vocabulary of BERTje is objectively better since BERTje uses a vocabulary that is created specifically for Dutch (https://s3.amazonaws.com/models.huggingface.co/bert/wietsedv/bert-base-dutch-cased/vocab.txt) whereas RobBERT uses the English RobBERTa vocabulary (https://s3.amazonaws.com/models.huggingface.co/bert/pdelobelle/robBERT-base/vocab.json).
Thanks. I have coupled back this reply to the discussion thread on LinkedIn.
If we see no further discussion here, we could close this issue. Or maybe add a note in the general README, as this question will come up every now and then when users will want to use optimal models for Dutch language texts.
Also, maybe you could comment on my (naive) question on complexity and cost. There is not just the precision that is relevant, but also the cost, complexity and time to get to those results ...
Thanks, I will close this issue. Aspects like cost, complexity, size and so on are more or less equivalent between these models as both are based on a 12-layer transformer encoder. These models can also be interchangably without any difficulty if you use the Transformers libraray by Huggingface.
This reference shows a comparison with RobBERT:
https://arxiv.org/pdf/2001.06286.pdf
What are the advantages of "bertje" (maybe it is smaller/simpler/cheaper to run) ?
FYI, there is a discussion on LinkedIn (sign-up required) at:
https://www.linkedin.com/feed/update/urn:li:activity:6631077105952178176?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6631077105952178176%2C6631183002305077250%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A6631077105952178176%2C6631187459520675841%29