Same acc, recall, precision, f1 when it comes to trainer.evaluate()

huggingface / setfit

Efficient few-shot learning with Sentence Transformers

Apache License 2.0

2.07k stars 207 forks source link

Hello!

This is certainly possible simply due to how precision and recall are calculated. For example in a binary case if the number of false negative and false positive samples is the same, and if the number of true negative and true positive samples are the same:

	Positive	Negative
Positive	TP=68	FP=32
Negative	FN=32	TN=68

In this table, the recall is TP / (TP + FN) = 68 / (68 + 32) = 0.68 and precision is TP / (TP + FP) = 68 / (68 + 32) = 0.68. For accuracy, we get (68 + 68) / (68 + 32 + 68 + 32) = 0.68.

However, it admittedly is a bit of a statistical anomaly. How did you calculate the precision, recall and F1?

Also, are you using SetFit? SetFit doesn't output eval_steps_per_second, eval_runtime, epoch, eval_samples_per_second, etc.

Tom Aarsen

huggingface / setfit

Same acc, recall, precision, f1 when it comes to trainer.evaluate() #419