Closed jechols closed 4 years ago
This will take a lot more time than expected. A quick prototype proved this was easily done, and could dramatically simplify the S3 plugin, but it won't be done terribly quickly.
The filesystem is assumed to be the source of all images and info.json
overrides. It's also used directly in the DZI handlers instead of just having those call the appropriate IIIF handlers. It being an experimental shim just to see if it was doable, I guess this shouldn't surprise me.
I think the code that registers decoders is probably where we'll want to fix this. Instead of just registering decoding functions by file extension, we should also have image readers or something that are registered and processed in order. We'd attempt to read from the filesystem if none of the other readers (plugins?) matched on the id. This would, for instance, let the S3 plugin register its reader and just respond when the IIIF id starts with "s3:". The reader would know how to read the image resource and return its info.json
response, something like this:
type Streamer interface {
io.ReadSeeker
Free() error
}
type ImageReader interface {
Stream() Streamer
GetInfo() *iiif.Info
}
Decoders would need to change as well. The IIIFImageDecoder
interface is currently living in the main
namespace, which would be awful to use in plugins. The iiif
package probably makes more sense. The registration of decoders would need to change as well so that the function takes a Streamer instead of a filename.
Some of the streaming work I've done is in the feature/streaming-jp2s
branch. It's messy and broken, but it's got the original prototype work that solved one of the tougher bits of this (interacting with openjpeg C streaming APIs).
The above info will undoubtedly prove not to be entirely correct if/when we implement it, but I hope it at least gives us a bit of direction.
OpenJPEG has API functions for reading from a stream rather than opening a file on disk. Implementing this may not be trivial, but it shouldn't be too bad, and it could open us up to some performance improvements if we wanted to cache a small selection of most recent JP2s in memory / read from S3 directly instead of copying the file / etc.
It would be valuable to do performance testing against S3-streamed vs. in-memory vs. on-disk JP2s if we implement this. If there's not a decent gain, that would be unfortunate, but good to know. If there is, it would be good to rebuild the S3 plugin to stream as well as optionally caching JP2s in RAM for small exhibits that need really fast tiles.