Closed CosimoRulli closed 1 month ago
Hello, Thanks for reporting!
No, you are not doing anything wrong, the add_pooling_layer
is manually set to False to prevent the additional pooling layer from BERT models, but I did not realize it was not supported by all the models.
I am currently working on the loading logic to enable the loading of stanford-nlp ColBERT models to be loaded without converting explicitly to PyLate models, so I'll also include a fix for this!
Sounds great, thank you very much!
Hello,
This should have been fixed by #52. Feel free to report if you encounter other issues! (you have to update the lib for the fix to apply)
@CosimoRulli did you have the chance to retry with the fix? Feel free to close the issue if it has been solved!
Hey, Sorry, I forgot to close the issue. Your fix solved the problem, thank you very much for your help! Just FYI, when I run the training, I get the following error, but this error does not cause the training to stop (I am currently ignoring it)
KeyError: 'scores'
Consider opening an issue on https://github.com/UKPLab/sentence-transformers/issues with this traceback.
Skipping model card creation.
I will report the performance evaluation of my model. It may be useful for you or for some of the other users.
I have trained distillbert
for 400k steps, lr
1e-5, batch size 32. My evaluation on nfcorpus
, with k=10
yields the following results (using your evaluation script):
Metric | Value |
---|---|
map | 0.1358 |
ndcg@10 | 0.3508 |
ndcg@100 | 0.2300 |
recall@10 | 0.1671 |
recall@100 | 0.1671 |
Yeah the error you are getting is something that happens in every training, it comes from the fact that, during the collation, I am removing part of the datasets not needed for inference (the ids after getting the actual texts), which are required for this function. To be honest I procrastinated fixing it because it does not hurt the training, just do not create the best looking possible card on Hugging Face, but I should fix it some day.
About the model you trained, it's hard to make estimation, but on NFCorpus, colbert-small (best ColBERT) is at 37.3 and ColBERTv2 is at 33.8 while v1 is at 30.5 (ndcg@10), so it looks pretty good!
Closing this as it seems resolved, feel free to open a new issue/comment if needed!
Hi,
thank you for building this amazing repo. My purpose is to train on msmarco a ColBERT model using
distilbert
as backbone. I took your scriptknowledge_distillation.py
and replacebert-base-uncased
withdistilbert/distilbert-base-uncased
, but I get the following error:Am I doing something wrong?