uga-libraries / hub-monitoring

Scripts for summarizing and validating content on the Digital Production Hub, the UGA Libraries' centralized storage for digital objects that are not suitable for our digital preservation system.
Creative Commons Attribution Share Alike 4.0 International
1 stars 0 forks source link

Error handling for NARA CSV column names #69

Closed amhanson9 closed 2 months ago

amhanson9 commented 4 months ago

If an older version of the NARA Preservation Action Plan spreadsheet is used, it may have outdated column names, which causes a KeyError in match_nara_risk(), which is called by new_risk_spreadsheet(). The column names are used a lot in match_nara_risk(), so it might be simpler to do try/except around the function call in new_risk_spreadsheet().

If this error happens, print an explanation and quit the script. The error message does make some sense, but it would help to have a little more information to understand how to fix the problem. This will also catch the next time NARA changes their column names.

amhanson9 commented 4 months ago

I decided to incorporate this into read_nara_csv() instead, since it seemed more logical to test the csv at the time it is read and so the script wouldn't spend time navigating to a risk csv only to fail. As part of this, only putting the 5 columns we use in nara_df instead of all of them so that only the names of the columns we use would cause an error.