Closed seanswyi closed 6 months ago
Hi,
The two should be equivalent if you convert the checkpoint before using evaluation.py
(for how to convert the checkpoint for evaluation and inference, please refer to our readme). We keep this discrepancy because we want to keep our training code flexible and the inference code as close to huggingface as possible
Thanks for the reply. Actually, I did try running simcse_to_huggingface.py
but it seems like the code is expecting a pytorch_model.bin
file whereas the training script only saved a model.safetensors
file. However, it seems like even if you don't have a converted file you can pass the directory containing the model.safetensors
file to from_pretrained
's pretrained_name_or_path
argument (probably because the version of transformers you're using is 4.2.1 and I'm currently using 4.36.2 in my setup).
That is:
# Assuming `model.safetensors` is saved in `/data/models`.
from transformers import AutoModel
model = AutoModel.from_pretrained("/data/models")
I think I should be able to use this, but do you know if the two would be different?
hi,
Yes if not converted there would be a difference. I believe there should be a way to convert safetensors to pytorch_model.bin? (not super familiar with the latest transformers version). Another workaround is to downgrade to 4.2.1
Yeah you're right. For anyone else wondering, you can easily convert the safetensors file to the more traditional pytorch_model.bin
file as follows:
>>> import torch
>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained(PATH_TO_SAFETENSORS)
>>> torch.save(model.state_dict(), PATH_TO_PYTORCH)
For anyone wondering why HuggingFace saves the model as safetensors, it's not so much HuggingFace itself but the Trainer object. When passing a TrainingArguments
object to the Trainer, there's a save_safetensors
arguments whose default value is set to True.
safetensors was already used as a default loading option from v4.30.0. But using safetensors as a default saving option was introduced in v4.35.0. You can read more about it here.
I noticed that when running evaluation via the Trainer's evaluate method whereas in the
evaluation.py
script you're not. The models seem to differ with the Trainer using a model for CL (e.g.,BertForCL
) whereasevaluation.py
is using a simple HuggingFace pretrained model.Is this intentional? I would think that the models should be the same. Not to mention that there's no checkpoint loading code in
evaluation.py
either. Please let me know if I'm mistaken. Thanks.