n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
284 stars 56 forks source link

Create classifier with fastai v1.0 #83

Open ghost opened 3 years ago

ghost commented 3 years ago

In your README.md is explained how to create a language model that can be used in fastai:

from fastai.text import *
import multifit

exp = multifit.from_pretrained("name of the model")
fa_config =  exp.pretrain_lm.tokenizer.get_fastai_config(add_open_file_processor=True)
data_lm = (TextList.from_folder(imdb_path, **fa_config)
            .filter_by_folder(include=['train', 'test', 'unsup']) 
            .split_by_rand_pct(0.1)
            .label_for_lm()           
            .databunch(bs=bs))
learn = exp.finetune_lm.get_learner(data_lm)  
# learn is a preconfigured fastai learner with a pretrained model loaded
learn.fit_one_cycle(10)
learn.save_encoder("enc")
...

However, when I try to create a classifier and load the encoder created with multifit I get the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-56-448089ca7212> in <module>()
      1 learn_c = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, metrics=[accuracy]).to_fp16()
----> 2 learn_c.load_encoder('/content/drive/MyDrive/models3/fine_tuned_enc')

1 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    767         if len(error_msgs) > 0:
    768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
    770 
    771     def _named_members(self, get_members_fn, prefix='', recurse=True):

RuntimeError: Error(s) in loading state_dict for AWD_LSTM:
    Missing key(s) in state_dict: "rnns.0.weight_hh_l0_raw", "rnns.0.module.weight_ih_l0", "rnns.0.module.weight_hh_l0", "rnns.0.module.bias_ih_l0", "rnns.0.module.bias_hh_l0", "rnns.1.weight_hh_l0_raw", "rnns.1.module.weight_ih_l0", "rnns.1.module.weight_hh_l0", "rnns.1.module.bias_ih_l0", "rnns.1.module.bias_hh_l0", "rnns.2.weight_hh_l0_raw", "rnns.2.module.weight_ih_l0", "rnns.2.module.weight_hh_l0", "rnns.2.module.bias_ih_l0", "rnns.2.module.bias_hh_l0". 
    Unexpected key(s) in state_dict: "rnns.3.layers.0.linear.weight_raw", "rnns.3.layers.0.linear.module.weight", "rnns.3.layers.0.linear.module.bias", "rnns.0.layers.0.linear.weight_raw", "rnns.0.layers.0.linear.module.weight", "rnns.0.layers.0.linear.module.bias", "rnns.1.layers.0.linear.weight_raw", "rnns.1.layers.0.linear.module.weight", "rnns.1.layers.0.linear.module.bias", "rnns.2.layers.0.linear.weight_raw", "rnns.2.layers.0.linear.module.weight", "rnns.2.layers.0.linear.module.bias". 
    size mismatch for encoder.weight: copying a param with shape torch.Size([60000, 400]) from checkpoint, the shape in current model is torch.Size([16, 400]).
    size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([60000, 400]) from checkpoint, the shape in current model is torch.Size([16, 400]).

It seems that the AWD_LSTM arch is not compatible with the one from the encoder that multifit creates. Is there a way to create load the encoder created with multifit into the fastai classifier?