ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

nuxeo.nuxeo validation results #705

Closed christinklez closed 9 months ago

christinklez commented 9 months ago

mapping issues (see UCI university archives reports):

expected is_shown_by issues? (since we will not be deep harvesting anymore); will get these thumbnails as part of the content harvest process:

id mismatch issues:

expected validation discrepancies!

barbarahui commented 9 months ago

I'm not seeing any problem for collection 26142 re mapping creator. Can you re-run the validate_by_mapper DAG for it?

christinklez commented 9 months ago

@barbarahui you're right! I'm going to strike the "creator" mapping one from the list. Thanks!

barbarahui commented 9 months ago

@christinklez re the extra blank space appearing before dates: this is how the data is actually entered in Nuxeo. I've added whitespace trimming to the Nuxeo mapper, but if this is widespread then there was some sort of data entry issue...

barbarahui commented 9 months ago

Re thumbnail_source for complex objects, we need to replicate this logic, which grabs a thumbnail from a component object if it's available:

https://github.com/ucldc/harvester/blob/master/harvester/fetcher/nuxeo_fetcher.py#L79-L141

I created an issue for this work: Issue: https://github.com/ucldc/rikolti/issues/720

barbarahui commented 9 months ago

@christinklez the is_shown_by for video example you provide above is actually for a complex object. Same issue as for the audio example.

christinklez commented 9 months ago

Thanks so much, @barbarahui! Reviewed a set of collections and the mapping fixes look good. Thanks!