Closed fpservant closed 11 months ago
Hello @fpservant,
I believe this is because accuracy
from the Hugging Face's evaluate
package expects the ground truth to be integer labels, rather than strings:
>>> import evaluate
>>> accuracy_metric = evaluate.load("accuracy")
>>> accuracy_metric.compute(references=[0, 1], predictions=[1, 1])
{'accuracy': 0.5}
>>> accuracy_metric.compute(references=["a", "b"], predictions=["b", "b"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[sic]\evaluate\module.py", line 432, in compute
self.add_batch(**inputs)
File "[sic]\evaluate\module.py", line 486, in add_batch
batch = self.selected_feature_format.encode_batch(batch)
File "[sic]\datasets\features\features.py", line 1596, in encode_batch
encoded_batch[key] = [encode_nested_example(self[key], obj) for obj in column]
File "[sic]\datasets\features\features.py", line 1596, in <listcomp>
encoded_batch[key] = [encode_nested_example(self[key], obj) for obj in column]
File "[sic]\datasets\features\features.py", line 1203, in encode_nested_example
return schema.encode_example(obj) if obj is not None else None
File "[sic]\datasets\features\features.py", line 465, in encode_example
return int(value)
ValueError: invalid literal for int() with base 10: 'b'
This can be resolved by encoding your labels, e.g.:
# Apply encoding
label_to_int = {label: int for int, label in enumerate(df["label"].unique())}
# {'LABEL1': 0, 'LABEL2': 1, 'LABEL3': 2}
df["label"] = df["label"].map(label_to_int)
This now outputs:
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
***** Running training *****
Num examples = 72
Num epochs = 1
Total optimization steps = 18
Total train batch size = 4
Iteration: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:06<00:00, 2.99it/s]
Epoch: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.02s/it]
***** Running evaluation *****
{'accuracy': 1.0}
I don't believe that this is mentioned in the README, and we don't currently have any docs set up. Apologies for this. When docs are being set up, this certainly should be included. Alternatively, we could implement an encoding behind the scenes as a pre-processing step before training when we encounter string labels.
Hope this helps somewhat.
Hi @tomaarsen, thank you for your answer. The behavior is a bit surprising, as trainer.train seems to work perfectly with the string labels (and the SetFitTrainer knows how to convert from label as a string to an indice, as shown by the results of trainer.model.predict and trainer.model.predict_proba) - So it seems to me that there is some form on an "encoding behind the scene" at the trainer level. It would therefore be more user friendly if it were working also for the evaluate. But anyway, thank you very much, your answer was fast, informative and helpful. Best Regards, fps
I agree, it is odd: (I believe) only the evaluation breaks, while everything else works correctly. It should be possible to counteract this by encoding labels only if we use a Hugging Face evaluate
metric, and only in the trainer.evaluate
call. However, there is also something to be said that this should simply be a feature request for Hugging Face's evaluate
, instead.
In case you're looking for them, another alternative that I just thought of is that you can also provide a callable as the metric
:
https://github.com/huggingface/setfit/blob/eee595ede962b9a9bbe62d4d919f5629d2fc2868/src/setfit/trainer.py#L427-L428
That way, you can provide your own functions for accuracy or other metrics which do allow string labels.
I meet the same problem today. And I find that instead of using load_metrcis from datasets package, you can just use accuracy_score from sklearn to build the compute_metrics as follow: from sklearn.metrics import accuracy_score def compute_metrics(y_pred, y_test): accuracy = accuracy_score(y_test,y_pred) return {"accuracy": accuracy} this would just solve the problem.
This is actually a really awful inconvenience that exists til today. Please fix.
Closed via #439
I still get that error with Seq2Seq trainer and I am using the latest version of the transformers.
SetFit doesn't offer a Seq2Seq trainer, that is specific to transformers
: https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.Seq2SeqTrainer
I meant to refer to the functionality of the accuracy metric score (i.e., working with integer labels) from Hugging Face
ValueError in trainer.evaluate in following conditions : Dataset created from a pandas DataFrame, label column containing strings.
Here is what I do:
Here's the output of last line:
Best Regards, fps