helboukkouri / character-bert

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"
Apache License 2.0
195 stars 47 forks source link

Pretrain character bert with new data #15

Closed steveguang closed 3 years ago

steveguang commented 3 years ago

hi, @helboukkouri , one more question. If I want to pretrain character bert with a new large dataset, based on general-char-bert, what should I do? Like in huggingface, they have a run_language_modeling script that could continue to train with bert-base or other models

helboukkouri commented 3 years ago

Hi @steveguang, did you have a look at the pre-training code: https://github.com/helboukkouri/character-bert-pretraining You may need to change some things in the main pre-training file according to your need. It's just a starter code :)