ray-project / llm-numbers

Numbers every LLM developer should know
4.1k stars 139 forks source link

CPU Stats for when it's possible #15

Open eren23 opened 1 year ago

eren23 commented 1 year ago

Running sentence-transformers on a CPU for various tasks is also possible, especially for consumer-grade libraries, etc. People are running these models w/o any GPU acceleration, which might be good to mention in the section.

We were using a sentence-transformer since the beginning in and even if it's a small open-source project all the users I know are using it on their CPUs.

waleedkadous commented 1 year ago

Weirdly, I tried it myself and it was considerably slower: like 20x slower. But I think that would be a really good section to add, especially with us also adding more info on llama.cpp (which we are starting to benchmark now). Give us 2 weeks and we'll see if we can do it.

lcrmorin commented 10 months ago

I would have appreciated to find this number too. From personnal experience (see: https://www.kaggle.com/code/lucasmorin/mistral-7-b-instruct-electricity-co2-consumption) the run time for the same query is 10x, which generally make the cpu usage impractical (or impossible).