huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.25k stars 223 forks source link

Use Setfit to classify text in a very specific field of a business segment #457

Open decunde opened 11 months ago

decunde commented 11 months ago

I'm trying to use 'few shot solution' since I have few samples labeled of data and this data has very specific jargons, acronym and terminology. Is there a specific strategy to use Setfit to accomplish a good accuracy in this type of scenario? Sorry if this is a very general question, but after reading all repositories I'm not confident yet this is the way. thanks.

tomaarsen commented 11 months ago

Hello! To my knowledge, there is no special strategy for your task. Perhaps a good solution is to get a simple few-shot training script up and running and then experiment with various different (open) models from MTEB.

And otherwise, I've always enjoyed sentence-transformers/paraphrase-mpnet-base-v2 for SetFit. I think you should be able to reach reasonably good performance with SetFit for your task, even if it is a bit niche.