uga-libraries / format-report

Aggregate and analyze csv files with file format information generated by the UGA Libraries' digital preservation system (ARCHive).
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Fix Excel stripping 0 from versions #59

Closed amhanson9 closed 10 months ago

amhanson9 commented 11 months ago

Format reports (CSV) are opened in Excel for manually matching to NARA risk data, to use filtering and sorting functionality. The format version column is treated like a number, causing trailing zeros to be incorrectly removed. For example, 2.0 becomes 2 and 3.10 becomes 3.1. The version number is still correct in the format identification column, which has information formatted name|version|PUID.

After the CSVs has been updated with NARA risk data, the version number needs to be updated with what is in the format identification column.

amhanson9 commented 11 months ago

Made a separate script for this, fix_versions.py. Replacing the entire Format_Version column with the version values from Format_Identification, rather than testing first to see if they are the same, since it is simpler. Resource used: https://stackoverflow.com/questions/40705480/python-pandas-remove-everything-after-a-delimiter-in-a-string