internetarchive / iiif

The official Internet Archive IIIF service
GNU General Public License v3.0
21 stars 4 forks source link

Use linking properties to refer to files and pages #50

Closed saracarl closed 1 month ago

saracarl commented 6 months ago

<moved from https://github.com/ArchiveLabs/iiif.archivelab.org/issues/71 >

The IIIF spec supports use of linking properties like seeAlso and related to refer to other resources about an item from inside a manifest. The Internet Archive's abundant derivative files could be linked from IA manifests, allowing clients to consume these resources easily.

(Real-world use case: a IIIF client that imports a printed document, then wants to import the OCR text.)

To take an example, this item has a manifest with these simple linking properties 👍

   "related":"http://archive.org/details/hhfbc196-bingenontherhine",
   "seeAlso":"http://archive.org/metadata/hhfbc196-bingenontherhine",

Adding the derived files to the manifest would look like this:

{
   "@context":"http://iiif.io/api/presentation/2/context.json",
   "@id":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine/manifest.json",
   "@type":"sc:Manifest",
   "attribution":"The Internet Archive",
   "description":"20x11 cm.<br /><br />First Line: \"A Soldier of the Legion lay dying in Algiers\"<br /><br />This is a scanned copy of the original broadside in the Helen Hartness Flanders Collection at Middlebury College.",
   "label":"Bingen on the Rhine",
   "logo":"https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcReMN4l9cgu_qb1OwflFeyfHcjp8aUfVNSJ9ynk2IfuHwW1I4mDSw",
   "metadata":[
      {
         "label":"title",
         "value":"Bingen on the Rhine"
      },
      {
         "label":"publisher",
         "value":"n/a"
      },
      {
         "label":"subject",
         "value":"Helen Hartness Flanders. Broadsides. Folk music. Folksongs. Ballads. New England"
      },
      {
         "label":"date",
         "value":"18??"
      }
   ],
   "related": [
      {
         "@id": "http://archive.org/details/hhfbc196-bingenontherhine",
         "label": "Internet Archive page for Bingen on the Rhine",
         "format": "text/html"
      }
   ],
   "seeAlso": [
      {
         "@id": "http://archive.org/metadata/hhfbc196-bingenontherhine",
         "label": "Internet Archive Metadata",
         "format": "application/json"
      }
      {
         "label":"Item Tile",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/__ia_thumb.jpg",
         "format": "image/jpeg"
      },
      {
         "label":"Animated GIF",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg.gif",
         "format": "image/gif"
      },
      {
         "label":"Text PDF",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg.pdf",
         "format": "application/pdf"
      },
      {
         "label":"Abbyy GZ",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg_abbyy.gz",
         "format": "application/gzip"
      },
      {
         "label":"DjVuTXT",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg_djvu.txt",
         "format": "text/plain"
      },
      {
         "label":"Djvu XML",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg_djvu.xml",
         "format": "application/xml"
      },
      {
         "label":"Generic Raw Book Zip",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg_images.zip",
         "format": "application/zip"
      },
      {
         "label":"Single Page Processed JP2 ZIP",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg_jp2.zip",
         "format": "application/zip"
      },
      {
         "label":"Scandata",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg_scandata.xml",
         "format": "application/xml"
      },
      {
         "label":"Archive BitTorrent",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/hhfbc196-bingenontherhine_archive.torrent",
         "format": "application/x-bittorrent"
      },
      {
         "label":"File Metadata",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/hhfbc196-bingenontherhine_files.xml",
         "format": "application/xml"
      },
      {
         "label":"Metadata",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/hhfbc196-bingenontherhine_meta.sqlite",
         "format": "application/octet-stream"
      },
      {
         "label":"Item Metadata",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/hhfbc196-bingenontherhine_meta.xml",
          "format": "application/xml"
      }
   ],
   "sequences":[
      {
         "@context":"http://iiif.io/api/image/2/context.json",
         "@id":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine/canvas/default",
         "@type":"sc:Sequence",
         "canvases":[
            {
               "@context":"http://iiif.io/api/presentation/2/context.json",
               "@id":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine$0/canvas",
               "@type":"sc:Canvas",
               "description":"",
               "height":2286,
               "images":[
                  {
                     "@context":"http://iiif.io/api/image/2/context.json",
                     "@id":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine$0/annotation",
                     "@type":"oa:Annotation",
                     "motivation":"sc:painting",
                     "on":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine$0/annotation",
                     "resource":{
                        "@id":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine$0/full/full/0/default.jpg",
                        "@type":"dctypes:Image",
                        "format":"image/jpeg",
                        "height":2286,
                        "service":{
                           "@context":"http://iiif.io/api/image/2/context.json",
                           "@id":"https://iiif.archivelab.org/iiif/hhfbc196-bingenontherhine$0",
                           "profile":"https://iiif.io/api/image/2/profiles/level2.json"
                        },
                        "width":1221
                     }
                  }
               ],
               "label":"p. ",
               "width":1221
            }
         ],
         "label":"default"
      }
   ],
   "thumbnail":{
      "@id":"https://ia600206.us.archive.org/BookReader/BookReaderPreview.php?id=hhfbc196-bingenontherhine&subPrefix=fbc.bs.196.bingenontherhine.jpg&itemPath=/29/items/hhfbc196-bingenontherhine&server=ia600206.us.archive.org&page=preview&"
   },
   "viewingHint":"paged"
}
saracarl commented 6 months ago

Good idea! I think rendering should be used instead of seeAlso for some resources like PDF (especially the raw PDF without the text)

saracarl commented 6 months ago

See also: https://github.com/ArchiveLabs/iiif.archivelab.org/issues/11

digitaldogsbody commented 5 months ago

Am I right in thinking the related in the v2 example above has been replaced by what we are populating for homepage in the v3 versions?

Also, do we want to include all the derivatives, or just a subset? Can we come up with a list of which ones we think belong in rendering and which should be in seeAlso?

The language of the spec, specifically:

The rendering resource MUST be able to be displayed directly to a human user

seems to make it clear that we must make this distinction between e.g the PDF and the SQLite representations

I am starting on the plumbing for this, so although we don't need to answer these questions immediately, it would be useful to resolve soon.

saracarl commented 4 months ago

Part of our mandate is to include as much data as possible to find via the API (I think of it as lighting the way for others who might want to include similar data). (see here for another discussion of this: https://github.com/ArchiveLabs/iiif.archivelab.org/issues/11#issuecomment-1473006921)

I'll work on which properties should be rendering and which seeAlso.

saracarl commented 4 months ago

These are the human readable derivatives I decided should be rendering rather than seeAlso:

      {
         "label":"Item Tile",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/__ia_thumb.jpg",
         "format": "image/jpeg"
      },
      {
         "label":"Animated GIF",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg.gif",
         "format": "image/gif"
      },
      {
         "label":"Text PDF",
         "@id":"https://archive.org/download/hhfbc196-bingenontherhine/fbc.bs.196.bingenontherhine.jpg.pdf",
         "format": "application/pdf"
      },

I'm a bit confused about IIIF's required type property for each of these; the example in the spec uses "Text", but that isn't a type listed IIIF's Additional Types or in the W3C Web Annotation Data Model.

Gist with updated example.

saracarl commented 4 months ago

Am I right in thinking the related in the v2 example above has been replaced by what we are populating for homepage in the v3 versions?

Yes, agreed, the homepage includes what seems to be the v2 related.

benwbrum commented 4 months ago

This may be an edge case, but I'm wondering about the hOCR file type in https://archive.org/download/journalofexpedit00ford

PDF should plainly be rendering, while DjVu should be seeAlso, but hOCR is--at least in theory--human readable. That said, when I attempt to view the file, none of the bounding box information displays as there is no CSS or JS to process it, so I'm tempted to put this in seeAlso since it requires an external program before a human can work with it.

glenrobson commented 4 months ago

seeAlso and format types:

https://www.w3.org/TR/annotation-model/

glenrobson commented 4 months ago

Add more descriptive label than just format. See viewer behaviour:

https://iiif.io/api/cookbook/recipe/0053-seeAlso/

saracarl commented 4 months ago

Another example with lots of derivatives is https://archive.org/metadata/journalofexpedit00ford

A/V example with derivatives: https://archive.org/metadata/DuckandC1951

glenrobson commented 4 months ago

Create a curated list of formats we want to see in rendering and seeAlso. Current list:

https://github.com/internetarchive/iiif/blob/5015806341df19a167751f154117ef18842462b8/iiify/resolver.py#L367

(with Web Anno format)

saracarl commented 3 months ago

I pulled all the derivatives out of both a book and a movie and put them in the following spreadsheet: https://docs.google.com/spreadsheets/d/1C3CXCvTPBxi-kmF9KfakXT_CLHlOs3XVvR-2P6QH5Fk/edit?usp=sharing I marked the rendering properties; the others are all potential seeAlsos. There's a lot. And we should discuss.

TODO: I think we should also look at an image item's derivatives.

saracarl commented 3 months ago

Decided today:

  1. We'll include all derivatives unless we explicitly decide to leave them out.
  2. None of the IA formats for our seeAlsos have profiles in the list of profiles, so no profiles needed.

See spreadsheet for list of derivatives and their seeAlso/rendering classification.