Open scossu opened 7 years ago
I would very much like to get this working, as we’re seeing a small number of 500s from our Loris instance which are triggered by the SimpleHTTPResolver source dropping the connection. If we have ETag caching, we’d reduce the load on our HTTP source.
The simplest thing would be to store the ETag in a JSON file; something like:
# etag.json
{
"source": "https://private.myhttpsource.org/V1234.jpg",
"value": "0123456789abcdef"
}
which lives in the HTTP resolver cache alongside the image itself. (The HTTP resolver cache has a directory per image.)
So the logic for fetching an image becomes something like:
if image_is_in_cache:
old_etag = load_etag_from_json_cache()
new_etag = get_etag_from_head_request()
if (
(old_etag.source == new_etag.source) and
(old_etag.value == new_etag.value)
):
return cached_image()
else:
fetch_image()
else:
fetch_image()
I’d probably tweak the logic to shortcut the HEAD request if you know the ETags aren’t going to match (e.g. if you don’t have a cached ETag), but it gives the general idea.
What do other people think of this suggestion? I’m particularly interested in @bcail and @scossu’s thoughts, but other opinions welcome.
I won’t write or deploy this before the New Year, but it’ll probably be near the top of my todo list when I get back.
So the goal here is for Loris to automatically update its cached source images, by checking the source http server for an update on each request? I think Loris currently just checks for whether a source image is in the cache - if it is, it uses it, and doesn't hit the source http server at all.
If we go this direction, I like the idea of having a configuration option - that would let users turn if off if the source server doesn't have ETags or if they don't want the performance hit of so many requests to the source server.
We implemented this in production for a period of time: https://github.com/aic-collections/loris/commit/3e3a67372fa11aac3373796acc87a94db7f227a5
This ticket is to implement an entity tag (ETag)–based cache in the HTTP resolver (https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26)
Not all HTTP servers support this feature so this should be a configurable option.
Proposed implementation :
SimpleHTTPResolver
to enable ETag-based cacheSimpleHTTPResolver
to specify key-value store connectionSimpleHTTPResolver
to implement the following process:If-None-Match
header to the source image server, whose value is the ETag stored in the KV store.304 Not Modified
. In this case Loris uses the cached image (if an ETag is present in the KV store, it is assumed that a cached image is present)If-None-Match
header is sent by Loris), the expected response would be2xx
for a successful retrieval.Depending on coding convenience, this may be a better fit for a separate resolver (subclass of SimpleHTTPResolver). In that case the current caching mechanism can be bypassed completely in favor of this.
A purge function could be implemented separately, in this case Loris would have to delete the UID-ETag pair with the cached image in order to be able to fetch the content again.