wellcomecollection / platform

Wellcome Collection Digital Platform
https://developers.wellcomecollection.org/
MIT License
48 stars 10 forks source link

Unrecognised accessConditions in the data coming to the METS transformer #4871

Closed alexwlchan closed 3 years ago

alexwlchan commented 3 years ago

We have a handful of source records which are failing in the METS transformer (based on the 2020-11-12) pipeline.

Some of these are caused by accessconditions in the METS that the transformer can't recognise:

<mods:accesscondition type="dz">A</mods:accesscondition>

B</mods:accesscondition>

J</mods:accesscondition>

creativecommons.org/licenses/by/4.0</mods:accesscondition>

We should either fix these in the METS source data, or update the transformer to handle them.

tomcrane commented 3 years ago

@alexwlchan I don't know if this is useful or not but here's what DDS makes of these:

https://github.com/wellcomecollection/iiif-builder/blob/master/src/Wellcome.Dds/Wellcome.Dds.Repositories/Presentation/LicencesAndRights/LicenceCodes.cs

and

https://github.com/wellcomecollection/iiif-builder/blob/master/src/Wellcome.Dds/Wellcome.Dds.Repositories/Presentation/LicencesAndRights/LicenceMap.cs#L18

alexwlchan commented 3 years ago

Thanks Tom. I suspect we'll just fix these up at source, given there's so few of them, rather than perpetuate these license codes into another system.

aray-wellcome commented 3 years ago

@alexwlchan Most of these are items that we no longer want and I've "deleted" them from the storage service so I guess they need removed from the pipeline somehow but I'm not sure how that needs to be done.

Deleted

b16774073 is a film that needs re-ingested but it's corrupted. I've "deleted" it from the storage service until it can be re-digitized.

b1665836x is a film that needs re-ingested but it's corrupted. I've "deleted" it from the storage service until it can be re-digitized.

b18022406 has been deleted from Goobi and the Storage Service

b17222333 has been deleted from Goobi and the Storage Service

b1616006x has been deleted from Goobi and the Storage Service

b17222527 has been deleted from Goobi and the Storage Service

b16159949 has been deleted from Goobi and the Storage Service

b16160083 has been deleted from Goobi and the Storage Service

b18194448 has been deleted from Goobi and the Storage Service

Updated license

b20456645_0001 and b20456645_0002 have had their licenses updated to CC-BY now

alexwlchan commented 3 years ago

they need removed from the pipeline somehow but I'm not sure how that needs to be done

That happens automatically when something gets updated in the storage service, so your changes should have fixed it. Thanks!

I'll double-check they came through the pipeline correctly, then close this ticket.

aray-wellcome commented 3 years ago

A majority of the deleted ones were deleted a while ago, not today, so I'm not sure they'll remove themselves if they haven't already.

alexwlchan commented 3 years ago

Huh, weird. We may need to reindex, okay.

alexwlchan commented 3 years ago

Closing in favour of https://github.com/wellcomecollection/platform/issues/4893