lightonai / pylate

Late Interaction Models Training & Retrieval
https://lightonai.github.io/pylate/
MIT License
158 stars 7 forks source link

distilbert raises an error #51

Closed CosimoRulli closed 1 month ago

CosimoRulli commented 1 month ago

Hi,

thank you for building this amazing repo. My purpose is to train on msmarco a ColBERT model using distilbert as backbone. I took your script knowledge_distillation.py and replace bert-base-uncased with distilbert/distilbert-base-uncased, but I get the following error:

  File "/code/examples/train/knowledge_distillation.py", line 40, in <module>
    model = models.ColBERT(model_name_or_path=model_name)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/pylate/models/colbert.py", line 232, in __init__
    super(ColBERT, self).__init__(
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 299, in __init__
    modules = self._load_auto_model(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/pylate/models/colbert.py", line 1086, in _load_auto_model
    transformer_model = Transformer(
                        ^^^^^^^^^^^^
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 54, in __init__
    self._load_model(model_name_or_path, config, cache_dir, **model_args)
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 85, in _load_model
    self.auto_model = AutoModel.from_pretrained(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/pylate/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3832, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: DistilBertModel.__init__() got an unexpected keyword argument 'add_pooling_layer'

Am I doing something wrong?

NohTow commented 1 month ago

Hello, Thanks for reporting!

No, you are not doing anything wrong, the add_pooling_layer is manually set to False to prevent the additional pooling layer from BERT models, but I did not realize it was not supported by all the models.

I am currently working on the loading logic to enable the loading of stanford-nlp ColBERT models to be loaded without converting explicitly to PyLate models, so I'll also include a fix for this!

CosimoRulli commented 1 month ago

Sounds great, thank you very much!

NohTow commented 1 month ago

Hello,

This should have been fixed by #52. Feel free to report if you encounter other issues! (you have to update the lib for the fix to apply)

NohTow commented 1 month ago

@CosimoRulli did you have the chance to retry with the fix? Feel free to close the issue if it has been solved!

CosimoRulli commented 1 month ago

Hey, Sorry, I forgot to close the issue. Your fix solved the problem, thank you very much for your help! Just FYI, when I run the training, I get the following error, but this error does not cause the training to stop (I am currently ignoring it)

KeyError: 'scores'
Consider opening an issue on https://github.com/UKPLab/sentence-transformers/issues with this traceback.
Skipping model card creation.

I will report the performance evaluation of my model. It may be useful for you or for some of the other users. I have trained distillbert for 400k steps, lr 1e-5, batch size 32. My evaluation on nfcorpus, with k=10 yields the following results (using your evaluation script):

Metric Value
map 0.1358
ndcg@10 0.3508
ndcg@100 0.2300
recall@10 0.1671
recall@100 0.1671
NohTow commented 1 month ago

Yeah the error you are getting is something that happens in every training, it comes from the fact that, during the collation, I am removing part of the datasets not needed for inference (the ids after getting the actual texts), which are required for this function. To be honest I procrastinated fixing it because it does not hurt the training, just do not create the best looking possible card on Hugging Face, but I should fix it some day.

About the model you trained, it's hard to make estimation, but on NFCorpus, colbert-small (best ColBERT) is at 37.3 and ColBERTv2 is at 33.8 while v1 is at 30.5 (ndcg@10), so it looks pretty good!

Closing this as it seems resolved, feel free to open a new issue/comment if needed!