Closed pollecuttn closed 2 months ago
Neither of these records have generated a collectionPath or referenceNumber coming out of the transformer:
works-identified-2024-05-13/_doc/zcyv6w3f
works-identified-2024-05-13/_doc/txsvmm67
...
"holdings": [],
<should be here>
"imageData": [],
...
Contrast with works-identified-2024-05-13/_doc/qt94qk2q
...
"holdings": [],
"collectionPath": {
"path": "MS6245/6284/6303/6308/1",
"label": "MS.6308/1"
},
"referenceNumber": "MS.6308/1",
"imageData": [],
...
As stated in the initial description, the MARC for these records looks as expected.
Syntactically, the non-functioning record b2027427
(aka MS.6309/1
, txsvmm67
) looks identical to the functioning record b2027345
(aka MS.6308/1
, qt94qk2q
)
The sourceModifiedTime is from the 16th, but there appear to be later versions.
That said, I don't think the difference between 15th and 16th look pertinent. Could be Harvest stuff.
I had wondered whether something had gone awry and leaked out of that particular pipeline. This is not the case, as the 2024-02-19 pipeline also has the same problem.
Having attempted to push version b2027427
version 2048 through the pipeline, it was discarded by the bit that checks whether we already have an identical document, so it's clearly a persistent problem transforming those specific works, rather than an ephemeral "falling out the side of the pipe" problem.
Ah! I've been silly. I think it's a CALM transformer or merge problem. This problem also appears in the 02-19 pipe
This is because the two bad CALM Works are marked as deleted:
(MS.6309/1
: 66dea80f-99d6-49d9-a34b-edcdc6ccd021
and MS.6309/2
: 460ca1d4-5b0a-49c5-b4ca-e32acd3e8888
)
GET works-identified-2024-02-19/_doc/xtdb38x4
GET works-identified-2024-02-19/_doc/d4ced62y
...
"deletedReason": {
"info": "Calm",
"type": "SuppressedFromSource"
},
"type": "Deleted"
...
This is true in both 2024-02-19 (previous live) and 2024-05-13 (current live), suggesting that it reflects the state of the source data, rather than being an odd fault.
GET works-identified-2024-05-13/_doc/xtdb38x4
GET works-identified-2024-05-13/_doc/d4ced62y
Not deleted - Suppressed. That makes more sense.
They are both suppressed because CatalogueStatus is Uncatalogued in CALM. All three (the two broken ones and the working one I've been comparing them to) have a retrievedAt value on the same day: 2023-11-24.
Collections Information colleagues asked to check if the two CALM records are catalogued, and if so, their last edit date.
Collections Information updated the records from 'uncatalogued'.
MS.6309/1 and MS.6309/2 are now showing up under MS.6309 in the hierarchy as expected https://wellcomecollection.org/works/tgyghy3f.
What
MS.6309 / Application numbers 1072-1191. / https://wellcomecollection.org/works/tgyghy3f has two works which should display below it in the hierarchy but don't:
As can be seen, the works exist on the site, but aren't in the hierarchy.
We need those two works to appear correctly in the hierarchy.
Collections and Information have checked the records in CALM and Sierra and can't see anything wrong there:
To do