wietsedv / bertje

BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models"
https://aclanthology.org/2020.findings-emnlp.389/
Apache License 2.0
135 stars 10 forks source link

Training Data for BERTje #25

Closed siebeniris closed 2 years ago

siebeniris commented 2 years ago

Hello,

thank you very much for BERTje!

We would like to do some analysis regarding BERT models in different languages. Is it possible to release the data you used for pre-training the model. Especially the ones without citations:

Thank you very much!

wietsedv commented 2 years ago

Given the nature and sources of the data, we are legally unable to share these datasets. This is unfortunate for reproducability, but there is nothing we can do about it. I hope you understand.

siebeniris commented 2 years ago

Okay, thanks for the quick reply :)