ollama / ollama-js

Ollama JavaScript library
https://ollama.com
MIT License
2.28k stars 179 forks source link

question: is gzip applied? #126

Open qinst64 opened 3 months ago

qinst64 commented 3 months ago

I have a very long prompt, and ollama is in a remote server. While sending request through http using ollama-js, is compression (i.e. gzip ) already applied so that speed is optimal?

hopperelec commented 3 months ago

The HTTP standard does not support compression for requests because it would require a pre-request to identify if the server is capable of decompressing it. When using APIs, compression support can be assumed, but this would require being implemented in the API itself first. The Ollama API is written in Go which I am not familiar with so I can't confirm for certain that it doesn't already support compressed requests, but I doubt it.

Compressing responses would be much easier to implement (at least when the response isn't being streamed) but, again, this would require changes to the Ollama API rather than ollama-js. I tested this and it does not seem that the Ollama API currently compresses responses. However, there might be specific circumstances where it does, I'm not sure.

As for whether implementing it would be a good idea, it's likely that generation speed is going to have a much greater effect on the overall speed than the request speed. Even with an upload speed of just 1Mb/s, it would take about a second to send a prompt which occupies the entirety of Llama 3.1's context window. Even when running on hardware dedicated to AI (e.g: Groq), generating a response to a request of that size takes much longer than a second, and Ollama is intended for consumer hardware. So, while it could be beneficial, I don't think it's really a concern at the moment.