wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
3 stars 0 forks source link

Archive records not showing up in the hierarchy MS.6309 #782

Closed pollecuttn closed 2 months ago

pollecuttn commented 2 months ago

What

MS.6309 / Application numbers 1072-1191. / https://wellcomecollection.org/works/tgyghy3f has two works which should display below it in the hierarchy but don't:

As can be seen, the works exist on the site, but aren't in the hierarchy.

We need those two works to appear correctly in the hierarchy.

Collections and Information have checked the records in CALM and Sierra and can't see anything wrong there:

The references are all correct to build the hierarchy.

To do

paul-butcher commented 2 months ago

Neither of these records have generated a collectionPath or referenceNumber coming out of the transformer:

works-identified-2024-05-13/_doc/zcyv6w3f
works-identified-2024-05-13/_doc/txsvmm67
...
"holdings": [],
<should be here>
"imageData": [],
...

Contrast with works-identified-2024-05-13/_doc/qt94qk2q

...
"holdings": [],
"collectionPath": {
  "path": "MS6245/6284/6303/6308/1",
  "label": "MS.6308/1"
},
"referenceNumber": "MS.6308/1",
"imageData": [],
...
paul-butcher commented 2 months ago

As stated in the initial description, the MARC for these records looks as expected.
Syntactically, the non-functioning record b2027427 (aka MS.6309/1, txsvmm67) looks identical to the functioning record b2027345 (aka MS.6308/1, qt94qk2q)

paul-butcher commented 2 months ago

The sourceModifiedTime is from the 16th, but there appear to be later versions.

That said, I don't think the difference between 15th and 16th look pertinent. Could be Harvest stuff.

paul-butcher commented 2 months ago

I had wondered whether something had gone awry and leaked out of that particular pipeline. This is not the case, as the 2024-02-19 pipeline also has the same problem.

paul-butcher commented 2 months ago

Having attempted to push version b2027427 version 2048 through the pipeline, it was discarded by the bit that checks whether we already have an identical document, so it's clearly a persistent problem transforming those specific works, rather than an ephemeral "falling out the side of the pipe" problem.

paul-butcher commented 2 months ago

Ah! I've been silly. I think it's a CALM transformer or merge problem. This problem also appears in the 02-19 pipe

paul-butcher commented 2 months ago

Epiphanies are happening. A working record - xmjsv3ag is the Work derived from a CALM record. The non-working ones (e.g. zcyv6w3f) are Works derived from Sierra records.

paul-butcher commented 2 months ago

This is because the two bad CALM Works are marked as deleted: (MS.6309/1: 66dea80f-99d6-49d9-a34b-edcdc6ccd021 and MS.6309/2: 460ca1d4-5b0a-49c5-b4ca-e32acd3e8888)

GET works-identified-2024-02-19/_doc/xtdb38x4
GET works-identified-2024-02-19/_doc/d4ced62y
...
"deletedReason": {
      "info": "Calm",
      "type": "SuppressedFromSource"
    },
    "type": "Deleted"
...
paul-butcher commented 2 months ago

This is true in both 2024-02-19 (previous live) and 2024-05-13 (current live), suggesting that it reflects the state of the source data, rather than being an odd fault.

GET works-identified-2024-05-13/_doc/xtdb38x4
GET works-identified-2024-05-13/_doc/d4ced62y
paul-butcher commented 2 months ago

Not deleted - Suppressed. That makes more sense.

They are both suppressed because CatalogueStatus is Uncatalogued in CALM. All three (the two broken ones and the working one I've been comparing them to) have a retrievedAt value on the same day: 2023-11-24.

pollecuttn commented 2 months ago

Collections Information colleagues asked to check if the two CALM records are catalogued, and if so, their last edit date.

pollecuttn commented 2 months ago

Collections Information updated the records from 'uncatalogued'.

MS.6309/1 and MS.6309/2 are now showing up under MS.6309 in the hierarchy as expected https://wellcomecollection.org/works/tgyghy3f.