Open Prashant-Baliyan opened 1 month ago
+1
I solved the error by updating the tokenizer and transformer libraries with pip -U
@jmatzat - which version of tokenizer and transformer I need to go with?? If you can see above I am using below version of transformer: transformers_version: "4.28.1"
and one more thing, where i have to use the pip -U command for updating the version, I mean during fine-tuning or during validation as we are having different instance for both.
I encountered the problem while loading the SetFit Model from pretrained.
tokenizers_version: 0.19.1
transformers_version: 4.40.2
You might have to update scikit-learn aswell, after updating tokenizer and transformer
@jmatzat I tried to update the tokenizer and transformer version, but ended up with below error
transformers 4.30.2 requires tokenizers!=0.11.3,<0.14,>=0.11.1, but you have tokenizers 0.19.1 which is incompatible.
from setfit import SetFitModel ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.19.1.
Have you tried updating setFit aswell?
Updating Tokenizer and Transformer might require you to update other packages aswell, that depend on them.
@jmatzat - yes, I tried with two Setfit version setfit==0.7.0 and 1.0.3. But if you can see transformer version itself not compatible with Tokenizer version first.
Hi - currently we are fine-tuning the model: "paraphrase-multilingual-MiniLM-L12-v2" for our use case. In our pipeline, we have a model validation part where we are loading the trained model with:
model = SetFitModel.from_pretrained(model_dir)
but unfortunately, we are getting the below exception: -
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3.
Note: I am using amazon Sagemaker platform for finetuning with below configuration:
for traning: instance_type: "ml.g5.2xlarge" instance_count: 1 transformers_version: "4.28.1" pytorch_version: "2.0.0" setfit_version: "0.7.0" py_version: "py310"
for validation: instance_type: "ml.t3.xlarge" instance_count: 1
it was working fine with the above configuration but since last couple of days we are getting the above-mentioned exception. So, it would be great if anyone can help us out to fix the issue.
Do let me know if any other information is required from our side.