Open shyoulala opened 4 months ago
large model is gemma2 9b
Training can be accelerated, but can inference be accelerated as well in such a scenario?
Yes inference is 2x faster via Unsloth, however batched inference is all matrix multiplication bound, so speedups will be much less
I am using AutoModelForSequenceClassification for classifying a large model. Can I use this library, and how should I use it? Additionally, if my output is only one token and I do batch inference, will this library also provide acceleration? Thank you for your response.