psu-libraries / researcher-metadata

Penn State University's faculty and research metadata repository
https://metadata.libraries.psu.edu/
MIT License
7 stars 0 forks source link

Detect changes to existing file uploads in Activity Insight and process new files #693

Closed EricDurante closed 1 year ago

EricDurante commented 1 year ago

A big problem with the existing manual workflow for depositing publications from Activity Insight is that a faculty member can go back into Activity Insight at any time, edit one of their publications, and replace an old file upload with a new one that may be a different version of the publication. When this happens, the librarians who are working on gathering information in order to try to deposit the older file upload may not realize that a new file has been uploaded until after they've already spent time working on the version that they have - and sometimes it turns out that the older version that they've been working on cannot be deposited.

We'd like to try to save some wasted effort by automatically detecting when an existing publication with an existing file upload from Activity Insight has a new file upload in Activity Insight. We can skip this process altogether for publications that we know to be open access already and for publications for which we already have a file version that can be deposited. For all other publications with an associated file upload, we should check periodically to see if the file (or file location) that we have is still the same file that is stored in Activity Insight. If it's not, then we should import the location metadata for this new file (while preserving the metadata for the older file). The cron job that orchestrates this workflow should automatically detect publications that have new file metadata that is missing version or permissions information, and it should queue up jobs to download the new file and fetch this information.

ajkiessl commented 1 year ago

We can build this right into the Activity Insight importer. The importer needs to be tweaked to allow new files to be imported. Right now if there is already a file associated with a publication, nothing gets imported.