sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Bad urls should not break application #759

Closed jacobthill closed 3 years ago

jacobthill commented 4 years ago

Currently if a bad url is loaded in the agg_preview field (probably agg_is_shown_at as well), the following error message is displayed when clicking on the set of records containing the bad url (e.g. through selecting the data contributor):

Screen Shot 2020-08-21 at 9 20 28 AM

Selecting exhibit dashboard to unload the records results in the same message.

Loading bad urls is part of the transformation process. I have error checking locally but it only tests if the url is valid not if it is resolvable. I also forget to run that check sometimes. In some cases, particularly when the collection is large, it is difficult to find the record with the bad url.

Desired behavior:

aaron-collier commented 3 years ago

An option for this issue may be to expand or enhance the validation around web resources to include a check that they resolve the URL is valid: https://github.com/sul-dlss/dlme-transform/blob/main/lib/contracts/cho.rb#L125

jacobthill commented 3 years ago

This is captured in the google doc, but maybe first we should check to see if they are valid. I'm pretty sure the application doesn't break when a url doesn't resolve as long as its valid. It seems to only break when you pass something that doesn't look like a valid url. The other issue, of course, is when we re-harvest a set we should test urls and suppress those that no longer resolve so we keep the site clean or broken urls.

aaron-collier commented 3 years ago

@jacobthill I ran a test on the QNL data where we do not add the agg_preview if there isn't a manifest, this has resolved the breaking issue after re-transforming/indexing.

However - as stated elsewhere I think this is only a half measure. For discussion, what do we want to include in agg_preview (if anything) when there is no thumbnail URL or it is invalid?