uga-libraries / hub-audit

Compare the Digital Production Hub share contents to the inventory, to identifying information that needs updating.
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Inventory Validation #3

Closed amhanson9 closed 3 months ago

amhanson9 commented 6 months ago

All required columns (the first five) must have data. No validation of the content is required, since the only one with a controlled vocabulary is restricted with the spreadsheet.

All dates in "Date to review for deletion (required)" must be in the future. Most are actual dates, but some may be "Permanent" or a length of time, e.g., 6 months. Permanent is acceptable. For the length of time, mark for review.

If errors are found, add a note to the Audit_Result column.

amhanson9 commented 6 months ago

Required columns, after they are renamed by the script, are Share, Folder, Use, Responsible, and Review_Date. Folder is optional if it would repeat the share, but in practice Hub users are repeating the share in Folder.

Used https://datagy.io/pandas-conditional-column/ to put "Missing required data" in the Audit_Result column if any columns are missing required data. Did not get it to work where it could list which columns are missing, and it isn't worth it. The result has the inventory data still, so we can see which are blank.

amhanson9 commented 6 months ago

For dates to delete, only concerned with having the script verify ones that are specific dates.

There are also textual ones (e.g., 6 months from creation) to flag for manual review. Don't try to parse the time difference to calculate these manually because there aren't many. Textual dates that are Permanent or permanent do not need to be flagged.