Closed cmarqu closed 3 months ago
We have introduced force_caching
for the controller class, you can use it instead of writing a custom transport as suggested in https://github.com/karpetrosyan/hishel/issues/173#issuecomment-1923313628.
When force_caching
is enabled, responses are invalidated only when the TTL of the storage expires. So yes, I think you can enable it and call some vacuum function that will invalidate all the stale responses and remove them using the API that will be introduced in #241.
Here is how it can look like for the filestorage
import hishel
import httpx
class AsyncVaccumFileStorage(hishel.AsyncFileStorage):
async def vaccum(self):
async with self._lock:
# vaccum stuff here
...
storage = AsyncVaccumFileStorage()
cache_transport = hishel.CacheTransport(
controller=hishel.Controller(force_cache=True),
storage=storage,
)
await storage.vaccum()
Can I somehow say "ignore the cached data for this API endpoint but write out new cache data"?
Yes, you can write a custom controller to do that, like so:
from httpcore import Request, Response
import hishel
import httpx
from hishel._serializers import Metadata
class MyStorage(hishel.FileStorage):
def store(self, key: str, response: Response, request: Request, metadata: Metadata | None = None) -> None:
print('storing response', key, response)
return super().store(key, response, request, metadata)
class IgnoreCacheController(hishel.Controller):
def construct_response_from_cache(
self, request: Request, response: Response, original_request: Request
) -> Response | Request | None:
if request.extensions.get("ignore_cache"):
return None
return super().construct_response_from_cache(request, response, original_request)
client = httpx.Client(
transport=hishel.CacheTransport(
transport=httpx.HTTPTransport(),
controller=IgnoreCacheController(),
storage=MyStorage(),
)
)
client.get("https://hishel.com")
response = client.get("https://hishel.com", extensions={"ignore_cache": True})
assert response.extensions["from_cache"] is False
Thank you very much for this detailed response. I have modernized my usage with the controller's force_cache
now and am looking forward to the new remove
method for storage.
I am writing a tool that gets data from an API that always claims the responses are uncacheable, even though the data is mostly rather static. Using the code from https://github.com/karpetrosyan/hishel/issues/173#issuecomment-1923313628, I rewrite the headers to still cache all responses, and this provides a tremendous speedup. As said in #173, I am using code generated from the OpenAPI spec, so I don't have easy access to the single requests but am switching between a caching and a non-caching client that is an input parameter to the generated Python functions.
Now, I'm looking into cache invalidation. The API provides a way to get a list of data IDs that have changed since a certain point in time.
My current thinking goes like this (I'm using
FileStorage
):Does that sound workable, or is there an easier way even? Can I somehow say "ignore the cached data for this API endpoint but write out new cache data"?