psu-libraries / researcher-metadata

Penn State University's faculty and research metadata repository
https://metadata.libraries.psu.edu/
MIT License
7 stars 0 forks source link

File not found message in DOI checking list on QA #834

Closed anaelizabethenriquez closed 1 year ago

anaelizabethenriquez commented 1 year ago

For publication 68437 on QA, the DOI checking list is displaying "Not Found" in the "Download Files" column. There is an associated Activity insight OA file record (1465), but the "File download" field is blank.

I'm guessing the file didn't download -- not sure if it failed or if it's not in Activity Insight anymore. Is this a known problem?

If it's not in Activity Insight anymore, I would expect the Activity insight OA file record to go away, which would probably remove the publication from the scope that appears in that list. Not sure if that's actually how it's set up to work, though.

ajkiessl commented 1 year ago

Sorry I forgot that our filtering logic for ActivityInsightOAFile isn't fully developed yet in QA. Our PRs were getting conflicted and some things got merged before others, so we haven't yet merged this: https://github.com/psu-libraries/researcher-metadata/pull/817/files#diff-64ff73ecca0b252889293216dd49d8d7cee560c78f268cb6136ca931d3b2daf1R15 which will change the filtering to only filter out file records that already have a ScholarSphere open access location (instead of any open access location). The publication above has Unpaywall and Open Access Button locations so the file isn't being triggered to download despite the Publication (which does have the newer filtering logic implemented) being in the DOI check list.

As for deleting files that no longer exist in Activity Insight, we don't have anything implemented to do this yet. Earlier in development we didn't really have a good way to identify what has been deleted in Activity Insight. We should be able to at least identify when a file has been deleted once this PR is merged: https://github.com/psu-libraries/researcher-metadata/pull/821/files since we'll be storing the Activity Insight ID of the publication with the file record in RMD. We can use this data to determine during import if that publication no longer has a file in Activity Insight and delete it in RMD. We still wouldn't have a way to remove file records when an entire publication record is deleted from Activity Insight. I'm not too sure what the automated solution to that would look like.

anaelizabethenriquez commented 1 year ago

@ajkiessl Thanks. If I'm understanding correctly, the immediate problem, of something that doesn't really need its DOI checked showing up in the DOI checking list, will be moot once that code from #817 is merged.

I'm going to create a separate issue (#835) about removing files that are no longer in Activity Insight, so we can discuss that further.

ajkiessl commented 1 year ago

the immediate problem, of something that doesn't really need its DOI checked showing up in the DOI checking list, will be moot once that code from https://github.com/psu-libraries/researcher-metadata/pull/817 is merged.

@anaelizabethenriquez Well not exactly. #817 will fix the issue of the file not existing and not being downloadable/viewable within RMD.

This record technically should be in the DOI checking list, although it appears to be a false negative. The reason the DOI was not verified for this publication appears to be because the title from Unpaywall isn't complete: https://api.unpaywall.org/v2/10.1159/000202987?email=openaccess@psu.edu . So when we do our title matching during DOI verification, it fails.

I'm going to create a separate issue (https://github.com/psu-libraries/researcher-metadata/issues/835) about removing files that are no longer in Activity Insight, so we can discuss that further.

Ok sounds good.

anaelizabethenriquez commented 1 year ago

Ah, OK. I understand now. Thanks, @ajkiessl !