how to optimize setfit inference

hi,

im currently investigating what the options we have to optimize setfit inference and have a few questions about it:

gpu:
- torch compile: https://huggingface.co/docs/transformers/en/perf_torch_compile is the following the only way to use setfit with torch.compile?
```
model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model)
```
  info above was provided by Tom Aarsen.

does torch.compile also work for cpu? edit: looks like it should work for cpu too...

https://pytorch.org/docs/stable/generated/torch.compile.html does torch compile change anything about the accuracy of the model inference?

i see different modes here: Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....

cpu: what are the options to optimize cpu inference?
- BetterTransformer: https://huggingface.co/docs/transformers/en/perf_infer_cpu is BetterTransformer really not available for setFit? i dont see setFit in this list: https://huggingface.co/docs/optimum/bettertransformer/overview#supported-models

are there any other resources to speedup setfit model inference? where can you run a setFit model except torchServe?

Thanks, Gerald

huggingface / setfit

how to optimize setfit inference #519