Open Sixzero opened 8 hours ago
I wonder if we could somehow bring it down 0.3 seconds, what is the time for ai"Hello"echo
in the precompilation.jl
.
There is already mocking like this: https://github.com/svilupp/PromptingTools.jl/blob/main/src/precompilation.jl
It seems that the majority of the time is spent on the HTTP call (as per our Slack chat), so we would need to make sure the right HTTP paths get precompiled, perhaps with a mock server to make sure the HTTP stack gets called.
Did you manage to isolate how much is the compilation vs the API request itself?
Recently I decided to cut down on TTFX of EasyContext.jl and realized that TTFX of PromptingTools needs to improve a lot:
Correct me if I did something wrong here.