Closed dr-GitHub-account closed 2 years ago
Hello, thanks for opening an issue! We try to keep the github issues for bugs/feature requests. Could you ask your question on the forum instead?
Thanks!
I will. Thanks for the instruction.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Feature request
Adaptive pretraining methods like domain adaptive pretraining and task adaptive pretraining can benefit downstream tasks, which is illustrated in https://aclanthology.org/2020.acl-main.740.pdf. In https://huggingface.co/models, there are successful models pretrained with data from source domain. I would like to do adaptive pretraining(with tasks like MLM) to chinese-roberta-wwm-ext-large: https://huggingface.co/hfl/chinese-roberta-wwm-ext-large using unlabeled target domain data, so as to get better result in downstream tasks.
Motivation
BERT and related models are benefiting some areas. The following are some examples:
Your contribution
Hopefully, a domain-specific pretrained language model.