Closed geraldstanje closed 5 months ago
hi,
im currently investigating what the options we have to optimize setfit inference and have a few questions about it:
model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model)
info above was provided by Tom Aarsen.
does torch.compile also work for cpu? edit: looks like it should work for cpu too...
https://pytorch.org/docs/stable/generated/torch.compile.html does torch compile change anything about the accuracy of the model inference?
i see different modes here: Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....
are there any other resources to speedup setfit model inference? where can you run a setFit model except torchServe?
Thanks, Gerald
hi,
im currently investigating what the options we have to optimize setfit inference and have a few questions about it:
info above was provided by Tom Aarsen.
does torch.compile also work for cpu? edit: looks like it should work for cpu too...
https://pytorch.org/docs/stable/generated/torch.compile.html does torch compile change anything about the accuracy of the model inference?
i see different modes here: Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....
are there any other resources to speedup setfit model inference? where can you run a setFit model except torchServe?
Thanks, Gerald