uga-libraries / format-report

Aggregate and analyze csv files with file format information generated by the UGA Libraries' digital preservation system (ARCHive).
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Use pandas to merge format reports and standardize_formats.csv #13

Open amhanson9 opened 1 year ago

amhanson9 commented 1 year ago

Currently format_check() and in_standard() have to iterate through standardize_formats.csv once for every format until find a match or test all the formats. It would be more streamline to use pandas: read both into dataframes, drop duplicates from formats, merge, and make a dataframe for any formats that did not match.

amhanson9 commented 1 year ago

In the merge script, once dataframes are being used from the beginning, could add the NARA risk data then instead of reading the merged CSV into a dataframe at the end of the script.