uga-libraries / format-report

Aggregate and analyze csv files with file format information generated by the UGA Libraries' digital preservation system (ARCHive).
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Use pandas explode() to make by_aip CSV? #38

Open amhanson9 opened 1 year ago

amhanson9 commented 1 year ago

In the format analysis script for accessioning (https://github.com/uga-libraries/accessioning-scripts/blob/main/format_analysis_functions.py), the match_nara_risk() function uses pandas to convert the NARA CSV with a delimited column of file extensions into one row per file extension, repeating all other information for each extension. Is that a simpler way to get the by_aip CSV from by_group, instead of calculating them both at the same time?