latitude-dev / latitude-llm

Latitude is the open-source prompt engineering platform to build, evaluate, and refine your prompts with AI
https://latitude.so
GNU Lesser General Public License v3.0
802 stars 51 forks source link

Added rate limit in the gateway #467

Closed Ashad-h closed 2 weeks ago

Ashad-h commented 2 weeks ago

Fix #450

TODO

Question

geclos commented 2 weeks ago

This is awesome! Now that you are at it might as well include the compression middleware? https://hono.dev/docs/middleware/builtin/compress

geclos commented 2 weeks ago

Do you have a way to benchmark the gateway latency to see the impact of this ?

Not for local development, no. I usually recommend using drill for this. I might write add a benchmark for the api to the repo at some point.

geclos commented 2 weeks ago

Create different redis host for rate limit (I'm using QUEUE_HOST for now)

Use de cache host instead (core exports a cache method to instantiate the redis client to the cache).

There is no need to set different rate limits per subscription for now

Ashad-h commented 2 weeks ago

This is awesome! Now that you are at it might as well include the compression middleware? https://hono.dev/docs/middleware/builtin/compress

The compress middleware is breaking the token streaming in the playground, I don't understand why

geclos commented 2 weeks ago

The compress middleware is breaking the token streaming in the playground, I don't understand why

:/ no wrorries then, i will take at look at it at some point