karpetrosyan / hishel

An elegant HTTP Cache implementation for HTTPX and HTTP Core.
https://hishel.com
BSD 3-Clause "New" or "Revised" License
175 stars 22 forks source link

How to implement manual cache invalidation? #254

Closed cmarqu closed 3 months ago

cmarqu commented 3 months ago

I am writing a tool that gets data from an API that always claims the responses are uncacheable, even though the data is mostly rather static. Using the code from https://github.com/karpetrosyan/hishel/issues/173#issuecomment-1923313628, I rewrite the headers to still cache all responses, and this provides a tremendous speedup. As said in #173, I am using code generated from the OpenAPI spec, so I don't have easy access to the single requests but am switching between a caching and a non-caching client that is an input parameter to the generated Python functions.

Now, I'm looking into cache invalidation. The API provides a way to get a list of data IDs that have changed since a certain point in time.

My current thinking goes like this (I'm using FileStorage):

Does that sound workable, or is there an easier way even? Can I somehow say "ignore the cached data for this API endpoint but write out new cache data"?

karpetrosyan commented 3 months ago

We have introduced force_caching for the controller class, you can use it instead of writing a custom transport as suggested in https://github.com/karpetrosyan/hishel/issues/173#issuecomment-1923313628.

When force_caching is enabled, responses are invalidated only when the TTL of the storage expires. So yes, I think you can enable it and call some vacuum function that will invalidate all the stale responses and remove them using the API that will be introduced in #241.

Here is how it can look like for the filestorage

import hishel
import httpx

class AsyncVaccumFileStorage(hishel.AsyncFileStorage):

    async def vaccum(self):

        async with self._lock:
            # vaccum stuff here
            ...

storage = AsyncVaccumFileStorage()
cache_transport = hishel.CacheTransport(
    controller=hishel.Controller(force_cache=True),
    storage=storage,
)

await storage.vaccum()

Can I somehow say "ignore the cached data for this API endpoint but write out new cache data"?

Yes, you can write a custom controller to do that, like so:

from httpcore import Request, Response
import hishel
import httpx

from hishel._serializers import Metadata

class MyStorage(hishel.FileStorage):

    def store(self, key: str, response: Response, request: Request, metadata: Metadata | None = None) -> None:
        print('storing response', key, response)
        return super().store(key, response, request, metadata)

class IgnoreCacheController(hishel.Controller):
    def construct_response_from_cache(
        self, request: Request, response: Response, original_request: Request
    ) -> Response | Request | None:
        if request.extensions.get("ignore_cache"):
            return None

        return super().construct_response_from_cache(request, response, original_request)

client = httpx.Client(
    transport=hishel.CacheTransport(
        transport=httpx.HTTPTransport(),
        controller=IgnoreCacheController(),
        storage=MyStorage(),
    )
)

client.get("https://hishel.com")
response = client.get("https://hishel.com", extensions={"ignore_cache": True})
assert response.extensions["from_cache"] is False
cmarqu commented 3 months ago

Thank you very much for this detailed response. I have modernized my usage with the controller's force_cache now and am looking forward to the new remove method for storage.