feat: support server side response cache

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

https://mosecorg.github.io/mosec/

Apache License 2.0

791 stars 60 forks source link

feat: support server side response cache #395

Open kemingy opened 1 year ago

kemingy commented 1 year ago

Describe the feature

refer to:

https://github.com/triton-inference-server/server/blob/main/docs/user_guide/response_cache.md

Some ML models might benefit from the cache.

As for the storage part, I think ideally we should support both local and remote cache.

Why do you need this feature?

No response

Additional context

No response

AlexXi19 commented 1 year ago

Hey Keming, interested in taking a look at this issue, I briefly looked into some rust crates for this feature and found this crate. This crate seems to have support for redis cache, sized cache and timed cache (although i dont believe they have timed + sized cache). My first thought would be to add an axum middleware to handling the caching logic. What are your thoughts on this?

kemingy commented 1 year ago

Hey Keming, interested in taking a look at this issue, I briefly looked into some rust crates for this feature and found this crate. This crate seems to have support for redis cache, sized cache and timed cache (although i dont believe they have timed + sized cache). My first thought would be to add an axum middleware to handling the caching logic. What are your thoughts on this?

I think this PR should come with a benchmark. I don't know if this lib fits our requirements.

multi routes
local & remote cache
cache TTL
cache size limit

I don't know how it handles the cache key. Since the key/value could be a huge image (like 3 x 1000 x 1000 f32). The benchmark should include different key/value types like a simple string, an image, an embedding, etc.

AlexXi19 commented 1 year ago

Good point. Do you think the cache should be aware of the exact content type?

kemingy commented 1 year ago

Good point. Do you think the cache should be aware of the exact content type?

No. Because we don't really parse the HTTP request body on the Rust side. I list different types of data just because their sizes are different.

kemingy commented 1 year ago

For the benchmark, you can check https://github.com/tensorchord/inference-benchmark/tree/main/benchmark