requests-cache / aiohttp-client-cache

An async persistent cache for aiohttp requests
MIT License
122 stars 22 forks source link

.content instance is the same across multiple cached responses (for in-memory backend) #169

Closed rettier closed 1 year ago

rettier commented 1 year ago

The problem

All cached responses are the same instance, shared across all consumers. This can have unexpected side effects when using the client cached responses without special precaution. For a normal user this is ok, but when adding cache capabilities to existing libraries by passing a CachedSession() instead of a normal aiohttp session this can be hard to fix for the developer.

Notably this effects the .content response stream. It can only be read by the first and second request (i am assuming the first response is the original one, the second one is the cached object. see steps to reproduce below). All further .content.read() calls result in an empty string.

Expected behavior

Be able to call content.read() on subsequent cached responses, and have an identical result.

Steps to reproduce the behavior

Here is an example:

In [1]: from aiohttp_client_cache import CachedSession, CacheBackend
In [2]: session = CachedSession(cache=CacheBackend(cache_control=True))

# First response seems to be then not-cached response
In [3]: original_response = await session.get("https://example.com/")
In [4]: await original_response.content.read()
Out[4]: b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8...

# second response is the cached response, we now fully read the content stream
In [5]: cached_response1 = await session.get("https://example.com/")
In [6]: await cached_response1.content.read()
Out[6]: b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8...

# second response the identical object to the first response
In [7]: cached_response1 = await session.get("https://example.com/")
In [8]: cached_response1 is cached_response2
Out[8]: True

# now the stream was already fully read, and is empty
In [6]: await cached_response2.content.read()
Out[7]: b''

Workarounds

Outside library workaround When using the response on your own you can reset the private _content variable so a new stream is generated:

response = await session.get("https://example.com/")
response._content = None

Inside library workaround return a new CachedStreamReader() every time .content is accessed.

BUT NOTE: it is not a perfect replication of the actual behavior to an original response. In this case the behavior of calling response.content.read(10) twice would be different to a not-cached response. The cached response would always start from the beginning, while an original response would read the first 10 bytes the first time, and the second time it would read bytes 10-20.

Environment

JWCook commented 1 year ago

Thanks for the bug report. I already fixed similar behavior with the in-memory backend in requests-cache, so it should be easy enough to fix it for this library as well.

JWCook commented 1 year ago

Fixed in v0.9.