huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.23k stars 220 forks source link

Displayed "Num examples" after the training start seems wrong #473

Closed michael-brunzel closed 9 months ago

michael-brunzel commented 10 months ago

In the current implementations 1.x the number of examples (after the creation of contrastive pairs) is computed as: "logger.info(f" Num examples = {len(train_dataloader)}")" in trainer.py

This means that the result is the number of batches, which only equals the number of examples for batch size=1. This seems at least misleading! In the tutorials (which were apparently prepared for version 0.x) this however seems to be correct.

Could somebody clarify whether this new calculation in the 1.x versions is intended or actually a bug?

This behaviour can easily be reproduced by running https://github.com/huggingface/setfit/blob/main/notebooks/text-classification.ipynb with a version 1.x and then 0.x as comparison.

Thanks in advance!

tomaarsen commented 9 months ago

Hello!

I think you're right, it seems that this is now incorrect. I'll have a look into this! Thanks for reporting.