Closed amhanson9 closed 10 months ago
Made a separate script for this, fix_versions.py. Replacing the entire Format_Version column with the version values from Format_Identification, rather than testing first to see if they are the same, since it is simpler. Resource used: https://stackoverflow.com/questions/40705480/python-pandas-remove-everything-after-a-delimiter-in-a-string
Format reports (CSV) are opened in Excel for manually matching to NARA risk data, to use filtering and sorting functionality. The format version column is treated like a number, causing trailing zeros to be incorrectly removed. For example, 2.0 becomes 2 and 3.10 becomes 3.1. The version number is still correct in the format identification column, which has information formatted name|version|PUID.
After the CSVs has been updated with NARA risk data, the version number needs to be updated with what is in the format identification column.