Closed peregilk closed 4 years ago
Hi @peregilk, Weights in this repo are converted from original checkpoints Google released. English corpus only.
Currently, this repo has support only for finetuning downstream tasks. Repo reproduces the same results as reported in google original repo. Tested.
Finetuning the model on Domain-specific data is not supported now. will be added soon.
Contributions are welcome.
@peregilk Pretraining and Finetuning on domain-specific support added. https://github.com/kamalkraj/ALBERT-TF2.0/blob/master/pretraining.md
Awesome. Really looking forward to testing this. Thanks a lot.
If anyone else is reading the post, I think the correct link should be: https://github.com/kamalkraj/ALBERT-TF2.0/blob/master/pretraining.md
I am a bit confused about the terminology here. And I might be completely wrong about this. However, in the original Bert-paper it seems like they are calling only the supervised part "fine-tuning", and are referring to this is "additional domain specific pre-training".
You can Fine-Tune pre-trained MLM and SOP model on Domain like Medical or You can pre-train from scratch to Domain-specific Data. Example of BERT pre-trained model, finetuned https://github.com/dmis-lab/biobert
It was just a question regarding the terminology.
I just noticed that the Bert-page says "If your task has a large domain-specific corpus available (e.g., "movie reviews" or "scientific papers"), it will likely be beneficial to run additional steps of pre-training on your corpus, starting from the BERT checkpoint."
I know the point of doing additional MLM/SOP training on a domain-specific corpus really is to "fine-tune" the weights trained on the general corpus. I guess the reason they are not calling it "fine-tuning" is that this is used for task-specific training.
Could you give some more info about the weights linked here? Trained on English corpus only? As the original article?
You write the last layers is not available. This would probably mean they can not be used for additional domain specific pre-training, right? What would be required for doing this?