Llama.cpp benchmarks - Githubissues

Hi Hamel, you must have heard of llama.cpp, I saw your benchmarks on 03_inference.ipynb, but I couldn't see any mention of llama.cpp there. I believe it can run on the same GPU. I don't have that fancy GPU like that so I can't readily benchmark in the same way. TheBloke has this format: https://huggingface.co/TheBloke/Llama-2-7B-GGUF Maybe you didn't consider it the same class of tool? But it can run a server also, including a OAI style HTTP API. I found these benchmarks, which show MLC ahead of Llama.cpp, but I wonder if they had been setup to use GPU correctly. It might be worthwhile comparing to latest. https://github.com/mlc-ai/mlc-llm/issues/15#issuecomment-1664226006 https://github.com/mlc-ai/llm-perf-bench Anyway thanks for your analysis, pivotal stuff! @hamelsmu

hamelsmu / hamel-site

Llama.cpp benchmarks #10