Control memory use of InMemoryStorage

douglas-raillard-arm commented 5 months ago

InMemoryStorage allows controlling the number of cached request but not the size of the cached data. A single GET request with a large file (a few GB) seems to be consuming unreasonable amounts of memory, which cannot be controlled with the number of cached requests.

karpetrosyan commented 4 months ago

We definitely need some memory control here, so I think that would be great.

Before we begin, we should have answers to the following questions:

What should the storage do If there is no available space? ( should we just ignore that response, or maybe we should remove something from the cache )
Where we should put that logic ( into the AsyncInMemoryStorage or into the LFUCache class)

Simon-Will commented 2 months ago

I think there's two aspects to this:

One might want to control the overall memory that the cache takes up. Especially in the case of the InMemoryStorage (because memory is usually more limited than disk space), but that could also apply for other storages. After all, disk is not inifinite, either.
If the cache storage is limited, it might be useful to exclude extremely large responses from being cached, in order to keep them from taking up the majority of available cache space. This almost sounds like a controller-level feature because it's deciding if a response will be cached. But since the stored size depends on the storage, the controller cannot know that size. So, implementing that in the storage seems reasonable to me.

In fact, even the storage doesn't have complete information about how much space a cached response will really take to store because the cache "backend" (LFUCache, SQLite, file system) will probably take a bit more space than just the number of bytes of the serialized response. E.g., a file system cache will always take up a multiple of the block size plus the necessary inodes. SQLite needs to integrate it into its index. The dict in the LFUCache needs to store pointers to the stored responses and an entry in the self.freq_count dict, etc. So, in a perfect implementation, the storage could ask its backend how much more space will be taken up if a response were stored. Implementing this would probably get pretty complex and maybe brittle.

Since it's only requested here for the InMemoryStorage and that's probably where it's most relevant, we could implement the following naive version:

Give the InMemoryStorage two new parameters: max_cached_response_size and max_cache_size, which are passed to the LFUCache exactly like the capacity is already.
Refuse actually storing anything during LFUCache.put if the value's size is greater than max_cached_response_size. (This will require assuming the value's type. Currently, it's a generic type V, but we need to be sure that it's a primitive type like str or bytes, if we want to use sys.getsizeof.)
Keeping track of the total size of cached responses and detecting if storing a new one would put us over max_cache_size. If yes, evict the least frequently used one, like it's happening right now when exceeding capacity.
- We could also omit this whole procedure since capacity and max_cached_response_size together effectively also define a max_cache_size.
All of this will still just be a best-effort implementation because to do it correctly to the byte, we would need to include the exact memory size of the used dicts.

What do you think? Should I try including a max_cached_response_size?

Apart from that, maybe we should advise people using Redis that they should set maxmemory and maxmemory-policy?

douglas-raillard-arm commented 2 months ago

I think it would be reasonable to have something like that:

track the size of data being cached. Trying to account for the actual storage size is doomed as it's impossible to predict. Even if you start bean-counting things like number of inodes, it will be completely thrown off by e.g. compression at fs level. The only sane alternative I can think of would be a back pressure mechanism where the storage reports the free space left. The generic code can take a "measurement" before and after storing each entry, so it knows exactly how big is each. The downside is that it's impossible to predict an entry size before storing it, and polling for available space might be expensive depending on the storage implementation.
Have a hard limit and a headroom threshold. When soft limit (<hard limit> - <headroom>) is reached, you start evicting. The headroom used to cache the next request. If the next request turns out to be even larger, you can abort caching it mid-way. If you only have a single limit, you will end up having to evict older entries at the same time as processing the current request when it pushes the cache over the limit, or just abort caching the current request until you evicted older entries. That will either cause delays in handling requests and possibly mess with timeouts, or fail to cache some requests "randomly" from user point of view.

Having the storage report the space left might work quite well with an implementation that tries to preserve some headroom, and will adapt to changing conditions in shared storage situation (e.g. a filesystem).

karpetrosyan / hishel

Control memory use of InMemoryStorage #225