Closed CoderCowMoo closed 5 months ago
I have been wondering about this myself lately. It is also slow on linux.
time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }' {"id":"chatcmpl-ec8459c2-9e5f-4480-95d5-991b5744986d","object":"chat.completion","created":1712581422,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.005,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.005},"system_fingerprint":"fp_13a4b82d64","x_groq":{"id":"req_01htywxbdze52sphs7m08ba5td"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 2% cpu 0.385 total
~ time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }' {"id":"chatcmpl-23f46a90-c3c5-4345-9f1e-c181d39f6781","object":"chat.completion","created":1712581424,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.004,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.004},"system_fingerprint":"fp_1cc6d039b0","x_groq":{"id":"req_01htywxdpzf81vf7a8jqp6a10h"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 3% cpu 0.324 total
time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.42s user 2.13s system 143% cpu 2.468 total
~ time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.38s user 2.16s system 151% cpu 2.347 total
~ time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.43s user 2.11s system 148% cpu 2.388
Could you please provide cpu, ram and disk speed specs please? I'm 70% sure that this is just the program unzipping itself temporarily, executing, and then deleting everything, but I have neither the windows know how, or python expertise to either inspect file writes or profile the executable. I do have the time however, so if anyone could guide me in the right direction that would be greatly appreciated.
On Mon, 8 Apr 2024, 11:13 pm THOMAS_THOMAS, @.***> wrote:
I have been wondering about this myself lately. Compare hitting groq api with curl:
time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }'
{"id":"chatcmpl-ec8459c2-9e5f-4480-95d5-991b5744986d","object":"chat.completion","created":1712581422,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.005,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.005},"system_fingerprint":"fp_13a4b82d64","x_groq":{"id":"req_01htywxbdze52sphs7m08ba5td"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 2% cpu 0.385 total 0.385s
~ time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }'
{"id":"chatcmpl-23f46a90-c3c5-4345-9f1e-c181d39f6781","object":"chat.completion","created":1712581424,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.004,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.004},"system_fingerprint":"fp_1cc6d039b0","x_groq":{"id":"req_01htywxdpzf81vf7a8jqp6a10h"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 3% cpu 0.324 total 0.324s Versus llm
time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.42s user 2.13s system 143% cpu 2.468 total 2.46s
~ time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.38s user 2.16s system 151% cpu 2.347 total 2.347s
~ time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.43s user 2.11s system 148% cpu 2.388 2.388
— Reply to this email directly, view it on GitHub https://github.com/simonw/llm/issues/445#issuecomment-2042727119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUD4VWRGG4M6JGASBYVTIP3Y4KJWXAVCNFSM6AAAAABFMBE6R2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSG4ZDOMJRHE . You are receiving this because you authored the thread.Message ID: @.***>
@simonw can you just advise on how the windows executable is being built?
I'm on linux and it was slow for me too. I've created a new python environment and installed fewer plugins, and, so far, its a lot faster.
time llm -h ✔ 5s ShellLM Usage: llm prompt [OPTIONS] [PROMPT] Try 'llm prompt --help' for help.
Error: No such option: -h llm -h 0.36s user 0.08s system 99% cpu 0.444 total
I decided to do this and install my plugins one by one, testing execution speed each time.
Installing llm-claude, llm-claude-3, llm-groq and llm-gemini, no apparent slow down.
Installing llm-openrouter has a slight slowdown like 1 second.
Installing llm-sentence-transformers has a massive slowdown giving me the slowdowns on the order of 7-11 seconds.
Installing llm-cmd no noticeable increase above llm-sentence-transformers.
After uninstalling llm-sentence-transformers, execution falls to 2 seconds for help command. There's still a lot more to be done for optimization on this platform.
I tried installing all the same plugins on wsl2 (ubuntu 22.04), and no issues with speed.
TLDR;
I've tried looking into this myself, but I have no idea where to start with this. The problem is that the llm utility is that it takes 5-6s on startup, doing apparently nothing.
From what I've seen on the internet, this is because when it is packaged into an executable (assumedly by PyInstaller?) is it packaged into a --one-file mode, meaning it has to extract a complete python environment each time.
Other packages which dont have this problem are:
vatsalaggarwal/whisper-cli
, using poetry (idk if this is a build system or not).httpie/cli
, using snapcraft (again unfamiliar with this).huggingface/huggingface_hub
, monolithic repo, but the huggingface-cli within doesnt have this issue.A package which does have this problem is:
Vaibhavs10/insanely-fast-whisper
, same 4-6 second delay for a simple --help command. Uses pdm-backend.Any help with this would be appreciated, I really want to use llm more frequently, and llm-cmd especially.