Open anjackson opened 3 years ago
Initial experimentation with Cantaloupe using a stock image,
...and looks like it'll work nicely. Even with just URLs, e.g.
i.e. the % encoding is needed but then it's okay. Simplest implementation would be to add an endpoint to webrender-api
to unpack PWIDs so we can pass in the timestamp. We could allow direct or Base64 encoded forms. However, we need to talk to CDX to determine access rights.... So. The best idea is to have an internal API on ukwa-access-api
that manages the PWID and limits access etc.
The basic functionality was fairly straight-forward. For example (for those with access to DEV only right now):
The PWID has to be URL or Base64 encoded, so you can't pass e.g. urn:pwid:webarchive.org.uk:1995-04-18T15:56:00Z:page:http://portico.bl.uk/
in directly. Therefore, added a helper API that constructs the PWID and redirects to the IIIF endpoint. e.g.
IDEAS:
If we wrap IIIF around the page screenshotter, we get a lot of the features we'll need, like easy specification of sizes etc, for different purposes.
To make this work, given the format of IIIF URIs, we could use PWID's and Base64 encode them. e.g.
Becomes...
Which we use as the identifier in the IIIF
{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}
URLs, like this:This uses the
page
levelprecision-spec
, as this is what makes sense in this context. The prefix of the URL would have to be used to distinguish between the archived and crawl-time images.This could be done by running a Cantaloupe IIIF image server, which wraps plain image servers nicely, is used by our partners, and has lots of nice features like handling caching. This would pass the Base64 PWID on to a modified
webrender-puppeteer
which would decode thepwid64
and render the page at full size and ideally at high resolution. Cantaloupe would then cache this output and handle generating all necessary derivatives.Cantaloupe can also overlay e.g. the UKWA logo which might work quite nicely.
(We could also add http://labs.mementoweb.org/aggregator_config/archivelist.xml and use that to determine the right web archive endpoint for other archives.)