Synchronized flush of cache for updated image IDs

mzeinstra commented 9 years ago

Hi,

I've uploaded 2 versions of the Night Watch to compare image qualities

http://dev.embedr.eu/#/Rijksmuseum_SK-C-5
http://dev.embedr.eu/#/Rijksmuseum_wiki_SK-C-5

It seems that the encoding of the first one when horribly wrong :)

screen shot 2015-08-05 at 17 40 48

o1da commented 9 years ago

There is no problem with encoding itself. This problem is connected with using of one ID Rijksmuseum_SK-C-5 with more than one image with different URLs. This ID is used in batch n.2 and n.4. The tiles of both different images with same ID remains in memcache on IIIF server. And they are mixed there, IIIF doesn't have any concept of triggered flushing of cache. I can only flush cache manually. It probably could be developed or setup differently in #57

The best approach is to use one ID with one url all the time.

Same situation can happen if images are reordered in sequence. #37

mzeinstra commented 9 years ago

Interessting, that is not how I read the Wiki:

'New records, and existing records that changed their "url", field will be added to a background queue that downloads source images and transcodes them to JPEG2000 format appropriate for serving via the IIIF Image service.'

I expected that when I give a different URI than that would replace the image. Not mix the two images.

@klokan do you consider this behaviour by design or as a bug?

klokan commented 9 years ago

What is described in the Wiki happens. It may just take some time until the new image appears on the web to the user.

The background info:

There is a trade-off between the performance of the image service and possibility to update an image under one ID (URL). If you want fast and scalable service - then you cannot change images for existing identifiers too often because of caching.

Multiple caches on different places are utilised for better performance (and all needs to be flushed on the update of an individual image - and ideally in the same moment - which is almost impossible across multiple machines):

Cache in the user's web browser (defined time period + until he closes the browser window or enforce reload). This cannot be flushed by us and it is very bad idea to set it differently then now - it means a user who revisit the image will ALWAYS have a chance to see what you reported above.
Caches in each iiif machine (machines started dynamically by load-balancer):
- memcached (shared between the image server processes) flushing on each iiif machines (or one centralised memcache service if #57 is developed)
- internal cache in each iipimage/iiifserver process (flushing would require restart of the FastCGI scripts - which may mean short unavailability of the service on each update like this)
- file cache (the copy of the .jp2 file downloaded from S3 into local filesystem for serving).

The cache would need to be flushed bottom up (first on all iiif machines in the local S3 file cache, then in the processes, then in memcached, then wait until it happens in users browsers, etc). The iiif machines would need to get via a trigger (webhook) an information about the list of files (IDs or URL prefixes) to flush. We have no infrastructure proposed for this in the diagrams in the wiki and in the specification of the project.

We have been discussing this internally month ago - and result was to not put extra effort in this direction - especially after we have implemented the whole sequences extra. BTW this is related to #37 as well.

The use-case of updating an image for single identifier is very rare - and if it happens the caches are going to be flushed automatically after certain time. In case of extremely bothering situation all the caches can be flushed on all places manually.

klokan commented 9 years ago

To solve this properly - a new endpoint "/delete" to the running "iiif" servers would need to be added (a FastCGI inside of supervisord). Such endpoint would accept a file name (URL prefix) which should be flushed from all caches (local file cache and local memcached on the machine).

The task which do ingest update operation (change on the URL for known ID) would need to trigger the delete endpoint with the file which should be removed - an all machines started in the load balancer.

mzeinstra commented 9 years ago

Let's mark this as an enhancement.

klokan commented 9 years ago

@o1da

[x] Could you please make a manual brutal flush of the caches and describe in the Installation protocol how to make it step-by-step?

I expect it means deleting the file cache of s3fs and restarting memcached on every iiif virtual machine running behind the load-balancer.

It will hurt performance of the service temporarily, but it solves the problem, if it appears.

o1da commented 9 years ago

Cleaning of caches is done and described in the installation protocol.

klokantech / embedr

Synchronized flush of cache for updated image IDs #63