uga-libraries / accessioning-scripts

Scripts used for accessioning born-digital archives
Creative Commons Attribution Share Alike 4.0 International
9 stars 1 forks source link

NARA Risk Matching: extracting version #42

Open amhanson9 opened 1 year ago

amhanson9 commented 1 year ago

Location: match_nara_risk()

Description: Currently, anything after the last space in NARA Format Name is used as the version. There are additional formats with versions that do not match this pattern. Two patterns that are after the last space but have additional characters to remove, which might be easy to implement, are "name (version)" and "name v.version".

Priority: waiting to see how often formats with this pattern are in our accessions.

amhanson9 commented 1 year ago

From reviewing 2023 accessioning data, PDF/A is common enough to potentially be worth extracting version from that pattern as well. Portable Document Format/Archiving (PDF/A-VERSION) OPTIONAL-TEXT

amhanson9 commented 1 year ago

As we add new version patterns for extracting from NARA, can we simultaneously use these patterns to combine FITS name and version, to help with those matches?

amhanson9 commented 1 year ago

Per May 2023 conversation with NARA, they plan to split version number into a separate column in the next 6 months, so this may no longer be necessary.