michaelgfalk / ohrm-jsonld-exporter

Code to export an OHRM database to JSON-LD (loosely RO-Crate format)
GNU General Public License v3.0
1 stars 0 forks source link

handle multipage images better #2

Open michaelgfalk opened 9 months ago

michaelgfalk commented 9 months ago

At the moment, multipage images are not very accessible in the JSON-lD export.

At the moment, the exportly just naively inserts the text in the dov column of the dobjectversion table into the JSON. This string needs to be parsed, so that a seperate URI is coined for each image in the output file.

Example: objects/images/image_viewer_paged.htm?VPRS3622-2,345,268,S should be parsed into:

NB: It seems that sometimes the database is out of step with the actual contents of the 'objects' folder for a given OHRM. Thus is may be preference to not try parsing the string in the dov field. Instead, a better approach might be to:

  1. See if there the enclosing directory still exists (e.g. objects/images/VPRS3633-2/)
  2. If the directory doesn't exist, don't include any data about the files that (once) were indicated by the record
  3. If the directory does exist, enumerate all the files in it and add a corresponding 'DigitalObjectVersion' to the JSON-lD file
michaelgfalk commented 7 months ago

The new solution is to just point to the directory, e.g. objects/images/image_viewer_paged.htm?VPRS3622-2,345,268,S => objects/images/VPRS3622-2. The corresponding object in the graph should have type "Dataset" rather than "File."

Some notes: