Pretraining data sharing

instadeepai / tunbert

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

MIT License

107 stars 37 forks source link

Pretraining data sharing #5

Open ArijRB opened 2 years ago

ArijRB commented 2 years ago

Hello, Thank you for sharing and the models.

I was wondering if you can share the details about the pre-training data. Is it possible de share the data for pre-training?

Thank you in advance.