wellcomecollection / platform

Wellcome Collection Digital Platform
https://developers.wellcomecollection.org/
MIT License
48 stars 10 forks source link

Some Sierra edits do not show up on wc.org #5468

Closed paul-butcher closed 2 years ago

paul-butcher commented 2 years ago

From this conversation: https://wellcome.slack.com/archives/C8X9YKM5X/p1648054810673729

This Sierra record https://search.wellcomelibrary.org/iii/encore/record/C__Rb3303105?lang=eng

corresponds to this wc.org record https://wellcomecollection.org/works/s2q7embx

The latter is expected to "be e-journal and should have a link to the resource"

paul-butcher commented 2 years ago

It looks as though we never see that Athens URL. It does not appear in the MARC, or in the JSON that arrives at the top of the pipeline.

pollecuttn commented 2 years ago

@paul-butcher the URL is in the 856 subfield u of the checkin record c11155176 that's attached to b33031058, if that helps.

Compare with https://wellcomecollection.org/works/xqypf9jt, which has the URL in the 856 subfield u of the checkin record c10892151, which is attached to b25059063.

paul-butcher commented 2 years ago

materialType is surfaced in the format property by the transformer, but "E-journals" is deliberately defined to map to "Journals" https://github.com/wellcomecollection/catalogue-pipeline/blob/b51c1e0434895dff22d7879449d3670c6007468a/common/internal_model/src/main/scala/weco/catalogue/internal_model/work/Format.scala#L140-L144

paul-butcher commented 2 years ago

1115517, as seen by the pipeline (Running SierraLiveDataTransformerTest with bib number 3303105 - it is returned in holdingsRecords) also does not contain that URL. It has an updated date and created dates of "2022-03-22T21:57:20Z",

paul-butcher commented 2 years ago

b33031058 has available=false and b25059063 has available=true. I wonder if that might be significant.

paul-butcher commented 2 years ago

The 856 subfield is not present in the JSON document on s3 that corresponds to c11155176. The only difference between c1115176 and c10892151 (apart from dates and ids) seems to be that 1115176 only has these two varfields:

"varFields": [
    {
      "fieldTag": "l",
      "content": "SLKe1000665"
    },
    {
      "fieldTag": "p",
      "content": "American Medical Association"
    }

c10892151 has those and a load of others, including the 856:

{
      "fieldTag": "y",
      "marcTag": "856",
      "ind1": " ",
      "ind2": " ",
      "subfields": [
        {
          "tag": "u",
          "content": "http://resolver.ebscohost.com/Redirect/PRL?EPPackageLocationID=474.38171.313587&epcustomerid=s7451719"
        },
        {
          "tag": "z",
          "content": "Connect to JAMA"
        }
      ]
    },
paul-butcher commented 2 years ago

So, by looking at the example given by @pollecuttn, I can see where the link information should be and it's definitely not there in the JSON on s3. However, if I look at catalogue.wellcomecollection.org and search.wellcomecollection.org, there's no apparent difference between this record and the good record.

paul-butcher commented 2 years ago

This is the change from today (a manual minor edit): s3://wellcomecollection-platform-sierra-adapter-20200604/records_holdings/2022-03-25T13-30-02Z__2022-03-25T13-46-02Z/0000.json

This the one from the 22nd (part of the ebsco coverage load update) s3://wellcomecollection-platform-sierra-adapter-20200604/records_holdings/records_holdings/2022-03-22T21-45-02Z__2022-03-22T22-01-02Z/0000.json

paul-butcher commented 2 years ago

Editing the record and saving it made it work. I don't know what the underlying cause of this was, but it's fixed now. We should keep an eye out for it happening again with the next coverage load.