Closed christinklez closed 9 months ago
I'm not seeing any problem for collection 26142 re mapping creator. Can you re-run the validate_by_mapper DAG for it?
@barbarahui you're right! I'm going to strike the "creator" mapping one from the list. Thanks!
@christinklez re the extra blank space appearing before dates: this is how the data is actually entered in Nuxeo. I've added whitespace trimming to the Nuxeo mapper, but if this is widespread then there was some sort of data entry issue...
Re thumbnail_source
for complex objects, we need to replicate this logic, which grabs a thumbnail from a component object if it's available:
https://github.com/ucldc/harvester/blob/master/harvester/fetcher/nuxeo_fetcher.py#L79-L141
I created an issue for this work: Issue: https://github.com/ucldc/rikolti/issues/720
@christinklez the is_shown_by for video example you provide above is actually for a complex object. Same issue as for the audio example.
Thanks so much, @barbarahui! Reviewed a set of collections and the mapping fixes look good. Thanks!
mapping issues (see UCI university archives reports):
creator: map creator (see UCR WRCA #26142)[this is fetching/mapping fine; no fix needed]expected is_shown_by issues? (since we will not be deep harvesting anymore); will get these thumbnails as part of the content harvest process:
This is a main content video fileThis is a complex object with no file at the parent level; legacy provides an s3 URL for a thumbnail image, Rikolti needs updated logic (see UCB ESL #27549)id mismatch issues:
expected validation discrepancies!