wellcomecollection / platform

Wellcome Collection Digital Platform
https://developers.wellcomecollection.org/
MIT License
48 stars 10 forks source link

Metadata visibility: Ownership and custodial history [MARC 561] - discovery #5758

Open jcateswellcome opened 1 month ago

jcateswellcome commented 1 month ago

Background

Currently the data element 'ownership and custodial history' is not displaying on the works page for records from Sierra.

This is MARC element 561

Given the nature of the collection, there are a range of

Sample record: https://wellcomecollection.org/works/yw9umag7 does not contain much information, whereas the 561 contains a great deal of information relating to the history of the item as can be seen in the reporting cluster&_a=(columns:!(),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,field:parent.id,index:d5ea4b8c-f58d-409c-8fb7-8a9973ea67f7,key:parent.id,negate:!f,params:(query:'3338926'),type:phrase),query:(match_phrase:(parent.id:'3338926')))),index:d5ea4b8c-f58d-409c-8fb7-8a9973ea67f7,interval:auto,query:(language:kuery,query:'varField.marcTag%20:%22561%22%20'),sort:!()))

Consider how this transformation could be accommodated in the work model, for example can we map it as with archival collections which show an 'ownership note'.

This should be approached collaboratively, i.e. with collections information. This will help us better understand the user need and rationale for accessing this.

Part of: #5757

kenoir commented 1 week ago

Some notes from reviewing the conversation so far:

Relevant MARC fields:

Questions:

Some terms of interest:

kenoir commented 1 week ago

To answer:

561 is mentioned in the catalogue pipeline: https://github.com/search?q=repo%3Awellcomecollection%2Fcatalogue-pipeline+561+language%3AScala&type=code&l=Scala How does this result in 561 appearing on works?

Of the 3 relevant MARC fields mentioned by Collection Information, we only appear to extract information from MARC 561 and then only in specific circumstance. The logic for this is contained in the MarcNotes class, here.

There is a helpful comment indicating:

  /** Conditionally create an Ownership note, depending on the privacy of the
    * original note.
    *
    * A note is only created if it is explicitly marked as "Not Private" by
    * setting first indicator to 1
    *
    * https://www.loc.gov/marc/bibliographic/bd561.html
    */

Notes are added as a top level attribute of a work and can be of a constrained variety of types listed here, e.g. "binding-detail", "exhibitions-note" & "ownership-note".

wellcomecollection.org will render there with labels provided by the catalogue API, see this example.

Screenshot 2024-09-02 at 14 23 25

wellcomecollection.org: https://wellcomecollection.org/works/u4hc2hwe

Screenshot 2024-09-02 at 14 23 44

API: https://api.wellcomecollection.org/catalogue/v2/works/u4hc2hwe?include=notes

Now we can look to answer:

561 is missing here: https://wellcomecollection.org/works/yw9umag7 https://wellcomecollection.org/works/dwg8xwrc Why?

yw9umag7 is sourced from b33389263 which is missing a value in indicator 1, so therefore private and not transformed onto a work.

dwg8xwrc is sourced from b3347817xz [which contains a "1" value in indicator 1](https://reporting.wellcomecollection.org/s/sierra/app/r/s/s8Rmz), and the values "MS. annotations in Latin in two different hands., UkLW", so the ownership note "MS. annotations in Latin in two different hands appears.

I believe subfield 5 (where the UkLW value is contained) is suppressed when transforming notes.

kenoir commented 1 week ago

Some further questions to answer using the Sierra data available:

An addendum to this ^: I haven't found anywhere that we pay attention to any of these fields for item level records although there are item records with data in the 561 fields. Querying #collection-info to see if this is expected behaviour.

kenoir commented 1 week ago

How many bib records have 561 fields with contents?

varField.marcTag : 561 AND parent.recordType : "bibs" [7892 hits]&_a=(columns:!(),filters:!(),index:d5ea4b8c-f58d-409c-8fb7-8a9973ea67f7,interval:auto,query:(language:kuery,query:'varField.marcTag%20:%20561%20AND%20parent.recordType%20:%20%22bibs%22%20'),sort:!()))

Of the bib records with 561 fields, how often is the first indicator set to 1 i.e. explicitly marked "not private".

varField.marcTag : 561 AND parent.recordType : "bibs" and varField.ind1 : "1" [5667 hits]&_a=(columns:!(),filters:!(),index:d5ea4b8c-f58d-409c-8fb7-8a9973ea67f7,interval:auto,query:(language:kuery,query:'varField.marcTag%20:%20561%20AND%20parent.recordType%20:%20%22bibs%22%20and%20varField.ind1%20:%20%221%22%20'),sort:!()))

Of these "not private" records:

Screenshot 2024-09-02 at 15 00 11

https://reporting.wellcomecollection.org/s/sierra/app/r/s/W5o0e

kenoir commented 1 week ago

Questions relating to the 541 field:

kenoir commented 1 week ago

How many bib records have 541 fields with contents?

varField.marcTag : 541 AND parent.recordType : "bibs" [23718]

These breakdown like this:

Screenshot 2024-09-02 at 15 12 20

https://reporting.wellcomecollection.org/s/sierra/app/r/s/TKTzT

Of the bib records with 541 fields, how often is the first indicator set to 1 i.e. explicitly marked "not private".

varField.marcTag : 541 AND parent.recordType : "bibs" and varField.ind1 : "1" [586]

kenoir commented 1 week ago

Questions relating to the 591 field:

kenoir commented 1 week ago

How many bib records have 591 fields with contents (and are there any patterns in the contents)?

varField.marcTag : 591 AND parent.recordType : "bibs" [149114]

These breakdown like this (this samples only 5000 records and this dataset is much larger):

Screenshot 2024-09-02 at 15 39 26

Can we tell if the data in 591 fields should be private?

I can't! But #collections-info may be able to help work this out ... none of these fields have a value for indicator 1, but as this is a locally defined field perhaps there is a Wellcome Collection standard for understanding privacy here? Otherwise we can inspect the 591 data for patterns that we think match private data?

kenoir commented 1 week ago

To summarise for now:

This is about as far as I can get using the reporting cluster data.

kenoir commented 2 days ago

Slack thread with more context: https://wellcome.slack.com/archives/CGXDT2GSH/p1723802773006609