huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.07k stars 207 forks source link

Same acc, recall, precision, f1 when it comes to trainer.evaluate() #419

Open alexxony opened 9 months ago

alexxony commented 9 months ago

trainer.evaluate(eval_dataset=encoded_test_dataset)

output is

{'eval_loss': 0.6165900230407715, 'eval_precision': 0.681265206812652, 'eval_recall': 0.681265206812652, 'eval_f1': 0.681265206812652, 'eval_accuracy': 0.681265206812652, 'eval_runtime': 2.1712, 'eval_samples_per_second': 189.3, 'eval_steps_per_second': 5.988, 'epoch': 18.72}

tomaarsen commented 7 months ago

Hello!

This is certainly possible simply due to how precision and recall are calculated. For example in a binary case if the number of false negative and false positive samples is the same, and if the number of true negative and true positive samples are the same:

Positive Negative
Positive TP=68 FP=32
Negative FN=32 TN=68

In this table, the recall is TP / (TP + FN) = 68 / (68 + 32) = 0.68 and precision is TP / (TP + FP) = 68 / (68 + 32) = 0.68. For accuracy, we get (68 + 68) / (68 + 32 + 68 + 32) = 0.68.

However, it admittedly is a bit of a statistical anomaly. How did you calculate the precision, recall and F1?


Also, are you using SetFit? SetFit doesn't output eval_steps_per_second, eval_runtime, epoch, eval_samples_per_second, etc.