Open alexxony opened 9 months ago
Hello!
This is certainly possible simply due to how precision and recall are calculated. For example in a binary case if the number of false negative and false positive samples is the same, and if the number of true negative and true positive samples are the same:
Positive | Negative | |
---|---|---|
Positive | TP=68 | FP=32 |
Negative | FN=32 | TN=68 |
In this table, the recall is TP / (TP + FN) = 68 / (68 + 32) = 0.68 and precision is TP / (TP + FP) = 68 / (68 + 32) = 0.68. For accuracy, we get (68 + 68) / (68 + 32 + 68 + 32) = 0.68.
However, it admittedly is a bit of a statistical anomaly. How did you calculate the precision, recall and F1?
Also, are you using SetFit? SetFit doesn't output eval_steps_per_second
, eval_runtime
, epoch
, eval_samples_per_second
, etc.
trainer.evaluate(eval_dataset=encoded_test_dataset)
output is
{'eval_loss': 0.6165900230407715, 'eval_precision': 0.681265206812652, 'eval_recall': 0.681265206812652, 'eval_f1': 0.681265206812652, 'eval_accuracy': 0.681265206812652, 'eval_runtime': 2.1712, 'eval_samples_per_second': 189.3, 'eval_steps_per_second': 5.988, 'epoch': 18.72}