uga-libraries / format-report

Aggregate and analyze csv files with file format information generated by the UGA Libraries' digital preservation system (ARCHive).
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Fix Excel stripping 0 from versions #59

Closed amhanson9 closed 8 months ago

amhanson9 commented 9 months ago

Format reports (CSV) are opened in Excel for manually matching to NARA risk data, to use filtering and sorting functionality. The format version column is treated like a number, causing trailing zeros to be incorrectly removed. For example, 2.0 becomes 2 and 3.10 becomes 3.1. The version number is still correct in the format identification column, which has information formatted name|version|PUID.

After the CSVs has been updated with NARA risk data, the version number needs to be updated with what is in the format identification column.

amhanson9 commented 9 months ago

Made a separate script for this, fix_versions.py. Replacing the entire Format_Version column with the version values from Format_Identification, rather than testing first to see if they are the same, since it is simpler. Resource used: https://stackoverflow.com/questions/40705480/python-pandas-remove-everything-after-a-delimiter-in-a-string