rewicks / ersatz

Apache License 2.0
38 stars 5 forks source link

How to use the pretrained model for fine-tuning? #7

Open robotsp opened 2 years ago

robotsp commented 2 years ago

I checked there is a pretrained model in repo "https://github.com/rewicks/ersatz-models/tree/main/monolingual/en". As I cannot find the tokenizer Vocabulary, I am not sure how to finetune the existed model.

rewicks commented 2 years ago

Hi,

The current code-base isn't really set up for fine-tuning; however, you can access the tokenizer vocabulary in the model file which looks like:

model = {
    'vocab': sentencepiece_model_object,
    'args': args_namespace_from_training,
    'weights': model_state_dict
}

when loaded. You can see the load_model(checkpoint_path) function in trainer.py for more details.