psu-libraries / researcher-metadata

Penn State University's faculty and research metadata repository
https://metadata.libraries.psu.edu/
MIT License
7 stars 0 forks source link

Validate user-submitted OA URL to check if URL is actually OA #778

Open anaelizabethenriquez opened 1 year ago

anaelizabethenriquez commented 1 year ago

Thinking about #487 and the work we might put in to fix that makes me wonder if we'd be better off spending those resources on validation for that field that checks if the URL is actually OA. For many URLs, we should be able to send the URL to OA Button and learn whether it is OA. This type of validation would be more useful to the overall product (since we'd reduce the possibility of non-OA URLs getting served up to the public as OA URLs in researcher profiles) and might avoid some of the issues we're encountering with the current validation.

We'd need to consider what to do if the validation fails, though. One option would be to give the user a way to submit the URL to the Scholarly Communications and Copyright team (by sending a form email to openaccess@psu.edu?) for manual review. In that case, admins would need to be able to bypass the validation that's on the user-facing field.

EricDurante commented 1 year ago

I may not be thinking about this correctly, but isn't the purpose of this feature to gather URLs that are not already listed in OA Button's data (or any other source of OA information that we use)? If the URL is already in OA Button, wouldn't we have already found it automatically and avoided asking the faculty member to submit a URL in the first place?

Or would this cover cases where the URL is in OA Button, but we couldn't find it automatically for some reason (i.e., we have bad or incomplete metadata)?

anaelizabethenriquez commented 1 year ago

You're right about all this, @EricDurante , and I think that's why we didn't go for stronger validation originally.

But, since we're only searching OA Button (and Unpaywall) by DOI, we're not getting OA URLs automatically for publications where (1) RMD does not know the DOI or (2) the publication does not have a DOI. OA articles in either of these situations are what we want to serve with this feature. For (1) validating with OA Button should work well; for (2) it may result more false negatives (hence the need to refer users to SCC for manual help).

Those false negatives would be annoying, much like the problems described in #487. But it might be a smaller set of false negatives than we're currently getting. And, we could reduce the "false positives" (i.e. OA URLs that aren't actually OA) that the current validation doesn't attempt to address.

EricDurante commented 1 year ago

@anaelizabethenriquez That makes sense. Thanks for walking through that reasoning!

A minor correction (but I'm not sure if it changes anything): Absent a DOI, we do query OA Button by publication title. However, I'm not sure how closely the given title has to match OA Button's data in order to return a result.

anaelizabethenriquez commented 1 year ago

Ah, good to know; thanks! @EricDurante

ajkiessl commented 1 year ago

This all sounds good. We could send a request to OA Button and if that fails, we could have a pop-up asking the user if they'd like to send this url to openaccess@psu.edu for admins to review. I think it could be entirely automated from there. We'd just need to pass along the user's access ID, publication ID, and url.

If we did that, though, we'd potentially want to prevent the user from trying to submit again, which would require storing something like an email_sent timestamp (or something else to indicate this state) with the authorship after the email is sent.

Also, side note, admins can bypass the validation when adding OA Urls in the admin dashboard.

And another thing, would it make sense to also try Unpaywall if OA Button returns nothing?

anaelizabethenriquez commented 1 year ago

And another thing, would it make sense to also try Unpaywall if OA Button returns nothing?

Sure. I wasn't sure if they had a way to search by URL, but if they do that would be great.

ajkiessl commented 1 year ago

Ah right. They just have a DOI and title search.

ajkiessl commented 1 year ago

Or we could remove validation all together and replace the form with something that just sends an email to openaccess@psu.edu for admin review of the URL.

EricDurante commented 1 year ago

Just a note: if we do that 👆, we'll still need to record some indicator that the person submitted the URL via email so that we know that they took some action and should be excluded from future open access reminder emails about that publication. If an admin reviews the email and determines that the submitted URL is not valid/correct, they could perhaps remove this indicator so that the publication goes back into the OA reminder email workflow for the author to try again, or, more likely, they might manually resolve the situation in some other way (finding a correct URL, directly informing the author that the URL was bad and asking them to deposit the publication, etc.)