wellcomecollection / platform

Wellcome Collection Digital Platform
https://developers.wellcomecollection.org/
MIT License
48 stars 10 forks source link

Expose IIIF Collections, with correct partOf assertions #4480

Closed tomcrane closed 3 years ago

tomcrane commented 4 years ago

We will have Collections where the IIIF resource represents a single work (or rather, b number), such as a 3 volume book. These are IIIF collections but they relate directly to works. They are works.

We also need:

For aggregations - we can start using LoC identifiers for contributors instead of names derived from MARC.

E.g., https://api.wellcomecollection.org/catalogue/v2/works?identifiers=b30351261&include=identifiers,items,subjects,genres,contributors,production,notes,collection

For archive hierarchy - we can generate the IIIF Collection dynamically - https://digirati.slack.com/archives/CBT40CMKQ/p1597936607018500

IIIF Collections that Manifests assert themselves to be partOf in DDS.new:

tomcrane commented 4 years ago

Noting this idea before I forget it... For archives, the IIIF collections structure can for the most part just be a proxy/transformation of the Catalogue API at levels above the manifest. DDS doesn't need to store anything or know anything about archives, particularly. That's the Catalogue API's job. But in IIIF Collections, we can mix in an extra useful bit of info, from the DDS DB - the number of digitised items underneath a given level. The DDS DB does not need to know about the higher levels, it doesn't need rows for SubSeries or whatever - it can SUM from partial string matches on the CALM ref.

This can be put to user interface purposes by IIIF API consumers, but also could be a flag internally for whether to offer any child collections at all, if a branch has no digitised content at its leaves.

tomcrane commented 3 years ago

Transferring some issues from Slack, mostly so I'm forced to write myself a summary and don't have to keep remembering the discussions around them. These might be covered by other issues.

1) How does the API represent the tree?

Consider PPPBM/C/23/82/97 (calm-ref-no), aka PP/PBM/C.97 (calm-altref-no)

(API endpoint)

Tree 1, from root down (this is the one used on https://wellcomelibrary.org/item/b17221262, which uses CALM refs)

PPPBM ([Medawar, Sir Peter Brian (1915-1987)])
 ∟ PPPBM/C (Scientific Research)
    ∟ PPPBM/C/23 (Other material on scientific research)
       ∟ PPPBM/C/23/82 (Research Project Files: work at Clinical Research Centre)
          ∟ PPPBM/C/23/82/97 (Immunosuppression in Diabetes)

Tree 2, from root down (this is used on https://search.wellcomelibrary.org/iii/encore/record/C__Rb1722126, which uses the ALT-refs; in these the structure is not conveyed in the identifier itself, but the same hierarchy is still presented for navigation)

PP/PBM ([Medawar, Sir Peter Brian (1915-1987)]) [Collection]
 ∟ PP/PBM/C (Scientific Research) [Section]
    ∟ PP/PBM/C.23-153 (Other material on scientific research) [SubSection]
       ∟ PP/PBM/C.82-98 (Research Project Files: work at Clinical Research Centre) [SubSubSection]
          ∟ PP/PBM/C.97 (Immunosuppression in Diabetes) [Item]

These are identical structurally, Tree 2 just uses a non-hierarchical identfier. Every level is navigable.

The Catalogue API has this (abbreviated for clarity):

{
   "id": "ph328sg4",
   "title": "Immunosuppression in Diabetes",
   "ref": "PP/PBM/C.97",
   "partOf": [
      { "ref": "PP/PBM", title: "Medawar, Sir Peter Brian (1915-1987)" },
      { "ref": "PP/PBM/C", title: "Scientific Research" },
      { "ref": "PP/PBM/C.23-153", title: "Other material on scientific research" },
      { "ref": "PP/PBM/C.82-98", title: "Research Project Files: work at Clinical Research Centre",
         "partOf": [{ "ref": "PP/PBM/C.23-153", title: "Other material on scientific research",
            "partOf": [{ "ref": "PP/PBM/C", title: "Scientific Research",
               "partOf": [{ "ref": "PP/PBM", title: "Medawar, Sir Peter Brian (1915-1987)" }
           ]}
        ]} 
      ]}
   ]
}

Is this a transitional representation (from the discussion in Slack?)

That is, can I assume that it might look like this in future:

{
   "id": "ph328sg4",
   "title": "Immunosuppression in Diabetes",
   "ref": "PP/PBM/C.97",
   "partOf": [{ "ref": "PP/PBM/C.82-98", title: "Research Project Files: work at Clinical Research Centre",
      "partOf": [{ "ref": "PP/PBM/C.23-153", title: "Other material on scientific research",
         "partOf": [{ "ref": "PP/PBM/C", title: "Scientific Research",
            "partOf": [{ "ref": "PP/PBM", title: "Medawar, Sir Peter Brian (1915-1987)" }
           ]}
        ]} 
      ]}
   ]
}

...and that a safe option for traversing this now would be to assume that the LAST partOf in an array is the parent of the current item (this strategy would walk up the tree correctly for both of the above examples).

2) Missing levels

(Slack discussion)

Item: https://api.wellcomecollection.org/catalogue/v2/works/k75ersvg?include=partOf

This is missing https://search.wellcomelibrary.org/iii/encore/record/C__Rb1948840; it has two ancestors in the API response, but it is missing an additional direct parent PP/CRI/D/1

(aside, for interest - in PP/CRI, the calm-ref and calm-alt-ref have the same form, apart from the initial PP/CRI vs PPCRI).

3) No tree at all for archives from other places

Slack discussion

Example: Letter from J B S Haldane to Fred A Smith

This is workType h (Archives and Manuscripts) rather than archive-item (the ones above, which do exhibit tree structure, are archive-item).

It doesn't have a referenceNumber property, nor parts, partOf, etc.

This collection came from UCL, but is still described in CALM and exposed as IIIF (and is a significant digitised collection), so it needs those archive properties in the Catalogue API.

tomcrane commented 3 years ago

Note to self: this gives you the top level archive entry point:

https://api.wellcomecollection.org/catalogue/v2/works?workType=archive-collection&include=identifiers,items,subjects,genres,contributors,production,notes,parts,partOf,precededBy,succeededBy

tomcrane commented 3 years ago

This is done, but needs a lot of checking for archival hierarchy.