mbennett-uoe / whiiif

Simple IIIF Search service for OCRed texts
Other
15 stars 1 forks source link

Work with canvases not at <manifest_url>/canvas/<canvas_id> #16

Open mbennett-uoe opened 4 years ago

mbennett-uoe commented 4 years ago

Currently, we generate the URLs for the IIIF Search API on property by assuming that manifests use a uniform way to reference canvases inside them: https://github.com/mbennett-uoe/whiiif/blob/a6e3bd8d7cecc8179678944fe8097ac759b18c3a/whiiif/views.py#L122

However, this format is not required by the IIIF Presentation spec, so we should try and support other formats.

mbennett-uoe commented 4 years ago

Three possible approaches:

  1. Assume that anyone running Whiiif will be using a static pattern for all the manifests they hold.
    • So, add a config variable for that format
    • Can use Python format function named vars to facilitate this
      • E.g in the config: CANVAS_LOCATION = "{manifest_url}/mycustomconstruct/someotherthing/{canvas_id}"
      • and then in the view: app.config["CANVAS_LOCATION"].format(manifest_url='http://example.com/manifest', canvas_id='1337')
  2. As above, but assume the pattern is only static within each manifest
    • Potentially more flexible for people hosting Whiiif services for multiple disparate collections (possibly for different institutions or similar)
    • Store the data in manifest SOLR document, similar to manifest_url currently
      • Either a config string with replaceable parts as above
      • Or a JSON dictionary of ALTO Page IDs -> Canvas URIs.
      • Easy to do this at ingest
  3. Replace ALTO Page block IDs with Canvas URIs at ingest time
    • Already have a script to walk through ALTO + manifest and sub the canvas IDs, trivial to change it to work with the full canvas URI
      • Although this script does work by generating a bunch of sed commands, so it might be better to take the idea and just reimplement it using the etree walker that we currently have for parsing ALTO
    • Might break ALTO format??
mbennett-uoe commented 4 years ago

Although this script does work by generating a bunch of sed commands, so it might be better to take the idea and just reimplement it using the etree walker that we currently have for parsing ALTO

Turns out I already did this so +1 for this part of the idea!