Open eren23 opened 1 year ago
Weirdly, I tried it myself and it was considerably slower: like 20x slower. But I think that would be a really good section to add, especially with us also adding more info on llama.cpp (which we are starting to benchmark now). Give us 2 weeks and we'll see if we can do it.
I would have appreciated to find this number too. From personnal experience (see: https://www.kaggle.com/code/lucasmorin/mistral-7-b-instruct-electricity-co2-consumption) the run time for the same query is 10x, which generally make the cpu usage impractical (or impossible).
Running sentence-transformers on a CPU for various tasks is also possible, especially for consumer-grade libraries, etc. People are running these models w/o any GPU acceleration, which might be good to mention in the section.
We were using a sentence-transformer since the beginning in and even if it's a small open-source project all the users I know are using it on their CPUs.