TTFX optimization of aigenerate("text"; model)

svilupp / PromptingTools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

https://svilupp.github.io/PromptingTools.jl/dev/

MIT License

127 stars 14 forks source link

TTFX optimization of aigenerate("text"; model) #236

Open Sixzero opened 8 hours ago

Sixzero commented 8 hours ago

Recently I decided to cut down on TTFX of EasyContext.jl and realized that TTFX of PromptingTools needs to improve a lot:

time julia -e 'using PromptingTools; @time ai"Hi there"gpt4om;'
[ Info: Tokens: 28 @ Cost: $0.0 in 6.8 seconds
  7.875881 seconds (11.58 M allocations: 785.347 MiB, 5.01% gc time, 99.61% compilation time)
julia -e 'using PromptingTools; @time ai"Hi there"gpt4om;'  8.28s user 0.73s system 106% cpu 8.469 total

Correct me if I did something wrong here.

Sixzero commented 8 hours ago

I wonder if we could somehow bring it down 0.3 seconds, what is the time for ai"Hello"echo in the precompilation.jl.

svilupp commented 5 hours ago

There is already mocking like this: https://github.com/svilupp/PromptingTools.jl/blob/main/src/precompilation.jl

It seems that the majority of the time is spent on the HTTP call (as per our Slack chat), so we would need to make sure the right HTTP paths get precompiled, perhaps with a mock server to make sure the HTTP stack gets called.

Did you manage to isolate how much is the compilation vs the API request itself?