huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 223 forks source link

how to optimize setfit inference #519

Closed geraldstanje closed 5 months ago

geraldstanje commented 6 months ago

hi,

im currently investigating what the options we have to optimize setfit inference and have a few questions about it:

does torch.compile also work for cpu? edit: looks like it should work for cpu too...

https://pytorch.org/docs/stable/generated/torch.compile.html does torch compile change anything about the accuracy of the model inference?

i see different modes here: Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....

are there any other resources to speedup setfit model inference? where can you run a setFit model except torchServe?

Thanks, Gerald