CoderCowMoo commented 5 months ago

I've tried looking into this myself, but I have no idea where to start with this. The problem is that the llm utility is that it takes 5-6s on startup, doing apparently nothing.

From what I've seen on the internet, this is because when it is packaged into an executable (assumedly by PyInstaller?) is it packaged into a --one-file mode, meaning it has to extract a complete python environment each time.

Other packages which dont have this problem are:

vatsalaggarwal/whisper-cli, using poetry (idk if this is a build system or not).
httpie/cli, using snapcraft (again unfamiliar with this).
huggingface/huggingface_hub, monolithic repo, but the huggingface-cli within doesnt have this issue.

A package which does have this problem is:

Vaibhavs10/insanely-fast-whisper, same 4-6 second delay for a simple --help command. Uses pdm-backend.

Any help with this would be appreciated, I really want to use llm more frequently, and llm-cmd especially.

irthomasthomas commented 5 months ago

I have been wondering about this myself lately. It is also slow on linux.

Compare hitting groq api with curl:

time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }' {"id":"chatcmpl-ec8459c2-9e5f-4480-95d5-991b5744986d","object":"chat.completion","created":1712581422,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.005,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.005},"system_fingerprint":"fp_13a4b82d64","x_groq":{"id":"req_01htywxbdze52sphs7m08ba5td"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 2% cpu 0.385 total

0.385s

   ~  time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }' {"id":"chatcmpl-23f46a90-c3c5-4345-9f1e-c181d39f6781","object":"chat.completion","created":1712581424,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.004,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.004},"system_fingerprint":"fp_1cc6d039b0","x_groq":{"id":"req_01htywxdpzf81vf7a8jqp6a10h"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 3% cpu 0.324 total

0.324s

Versus llm

time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.42s user 2.13s system 143% cpu 2.468 total

2.46s

   ~  time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.38s user 2.16s system 151% cpu 2.347 total

2.347s

   ~  time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.43s user 2.11s system 148% cpu 2.388

2.388

CoderCowMoo commented 5 months ago

Could you please provide cpu, ram and disk speed specs please? I'm 70% sure that this is just the program unzipping itself temporarily, executing, and then deleting everything, but I have neither the windows know how, or python expertise to either inspect file writes or profile the executable. I do have the time however, so if anyone could guide me in the right direction that would be greatly appreciated.

On Mon, 8 Apr 2024, 11:13 pm THOMAS_THOMAS, @.***> wrote:

I have been wondering about this myself lately. Compare hitting groq api with curl:

time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }'

{"id":"chatcmpl-ec8459c2-9e5f-4480-95d5-991b5744986d","object":"chat.completion","created":1712581422,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.005,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.005},"system_fingerprint":"fp_13a4b82d64","x_groq":{"id":"req_01htywxbdze52sphs7m08ba5td"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 2% cpu 0.385 total 0.385s

   ~  time curl -s -H"Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mixtral-8x7b-32768", "stream": false, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1 }'

{"id":"chatcmpl-23f46a90-c3c5-4345-9f1e-c181d39f6781","object":"chat.completion","created":1712581424,"model":"mixtral-8x7b-32768","choices":[{"index":0,"message":{"role":"assistant","content":"Hello"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"prompt_time":0.004,"completion_tokens":1,"completion_time":0,"total_tokens":14,"total_time":0.004},"system_fingerprint":"fp_1cc6d039b0","x_groq":{"id":"req_01htywxdpzf81vf7a8jqp6a10h"}} curl -s -H"Authorization: Bearer $GROQ_API_KEY" -H -d 0.01s user 0.01s system 3% cpu 0.324 total 0.324s Versus llm

time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.42s user 2.13s system 143% cpu 2.468 total 2.46s

   ~  time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.38s user 2.16s system 151% cpu 2.347 total 2.347s

   ~  time llm -m groq "hello" -o max_tokens 1 Hello llm -m groq "hello" -o max_tokens 1 1.43s user 2.11s system 148% cpu 2.388 2.388

— Reply to this email directly, view it on GitHub https://github.com/simonw/llm/issues/445#issuecomment-2042727119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUD4VWRGG4M6JGASBYVTIP3Y4KJWXAVCNFSM6AAAAABFMBE6R2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSG4ZDOMJRHE . You are receiving this because you authored the thread.Message ID: @.***>

CoderCowMoo commented 5 months ago

@simonw can you just advise on how the windows executable is being built?

irthomasthomas commented 5 months ago

I'm on linux and it was slow for me too. I've created a new python environment and installed fewer plugins, and, so far, its a lot faster.

time llm -h  ✔  5s   ShellLM  Usage: llm prompt [OPTIONS] [PROMPT] Try 'llm prompt --help' for help.

Error: No such option: -h llm -h 0.36s user 0.08s system 99% cpu 0.444 total

CoderCowMoo commented 5 months ago

I decided to do this and install my plugins one by one, testing execution speed each time.

Installing llm-claude, llm-claude-3, llm-groq and llm-gemini, no apparent slow down.

Installing llm-openrouter has a slight slowdown like 1 second.

Installing llm-sentence-transformers has a massive slowdown giving me the slowdowns on the order of 7-11 seconds.

Installing llm-cmd no noticeable increase above llm-sentence-transformers.

After uninstalling llm-sentence-transformers, execution falls to 2 seconds for help command. There's still a lot more to be done for optimization on this platform.

I tried installing all the same plugins on wsl2 (ubuntu 22.04), and no issues with speed.

TLDR;

llm-sentence-transformers is very slow (+5-9 seconds)
Slow on windows in general, as linux isnt affected by llm-sentence-transformers.
more optimization needed for windows platform.

simonw / llm

Extremely slow startup on windows #445

Compare hitting groq api with curl:

0.385s

0.324s

Versus llm

2.46s

2.347s

2.388