huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.24k stars 2.69k forks source link

Does both 'bookcorpus' and 'wikipedia' belong to the same datasets which Google used for pretraining BERT? #666

Closed wahab4114 closed 4 years ago

thomwolf commented 4 years ago

No they are other similar copies but they are not provided by the official Bert models authors.