huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.07k stars 207 forks source link

exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

Open Prashant-Baliyan opened 1 month ago

Prashant-Baliyan commented 1 month ago

Hi - currently we are fine-tuning the model: "paraphrase-multilingual-MiniLM-L12-v2" for our use case. In our pipeline, we have a model validation part where we are loading the trained model with:

model = SetFitModel.from_pretrained(model_dir)

but unfortunately, we are getting the below exception: -

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3.

Note: I am using amazon Sagemaker platform for finetuning with below configuration:

for traning: instance_type: "ml.g5.2xlarge" instance_count: 1 transformers_version: "4.28.1" pytorch_version: "2.0.0" setfit_version: "0.7.0" py_version: "py310"

for validation: instance_type: "ml.t3.xlarge" instance_count: 1

it was working fine with the above configuration but since last couple of days we are getting the above-mentioned exception. So, it would be great if anyone can help us out to fix the issue.

Do let me know if any other information is required from our side.

PedroGarciasPainkillers commented 1 month ago

+1

jmatzat commented 1 month ago

I solved the error by updating the tokenizer and transformer libraries with pip -U

Prashant-Baliyan commented 1 month ago

@jmatzat - which version of tokenizer and transformer I need to go with?? If you can see above I am using below version of transformer: transformers_version: "4.28.1"

and one more thing, where i have to use the pip -U command for updating the version, I mean during fine-tuning or during validation as we are having different instance for both.

jmatzat commented 1 month ago

I encountered the problem while loading the SetFit Model from pretrained.

tokenizers_version: 0.19.1

transformers_version: 4.40.2

You might have to update scikit-learn aswell, after updating tokenizer and transformer

Prashant-Baliyan commented 1 month ago

@jmatzat I tried to update the tokenizer and transformer version, but ended up with below error

jmatzat commented 1 month ago

Have you tried updating setFit aswell?

Updating Tokenizer and Transformer might require you to update other packages aswell, that depend on them.

Prashant-Baliyan commented 1 month ago

@jmatzat - yes, I tried with two Setfit version setfit==0.7.0 and 1.0.3. But if you can see transformer version itself not compatible with Tokenizer version first.