sul-dlss / sul_pub

SUL system for harvest and managing publications for Stanford CAP, with controlled API access.
http://cap.stanford.edu
Other
8 stars 3 forks source link

Links out to Web of Science from Profiles pages are not working correctly #1195

Closed peetucket closed 1 year ago

peetucket commented 4 years ago

We use the WoS Expanded API for harvesting publications for display on Stanford Profiles pages. As part of this, we link each publication back to the full WoS record from each Profile page.

We just noticed that the links we provide no longer work (and I'm not sure how long they have been broken).

Our URL pattern is this: https://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/000430958200010

where the last part of the URL is the WoS UID. You can see an example of these links on a profile page here: https://profiles.stanford.edu/206688?tab=publications

If this full record citation page for the WoS records has changed, this is going to force us to do a pretty big remediation of our database.

The base URL is in our settings: https://github.com/sul-dlss/sul_pub/blob/master/config/settings.yml#L52 (Settings.SCIENCEWIRE.ARTICLE_BASE_URI)

This URL gets embedded in each and every WoS/Sciencewire publication in the :identifier section of the pub_hash, and is then passed back to the Profiles site via our API. They use it to build the links out. So if we need to change the base URL, it will require rebuilding the pub_hash for every single publication (i.e. lots and lots of database updates and then a massive update on the Profiles end too with a ton of API calls to paginate through them all).

Contacted Clarivate support with this case ID: Clarivate Analytics Case # CM-200903-4124039 : Links & URL [ ref:_00D411O1D5._5004No6QAl:ref ]

peetucket commented 4 years ago

Response 1 from Clarivate, unclear to us.

Thank you for your patience and I am sorry for the delayed response.

Upon investigating further, we found that the URL you have provided is from Links AMR. Could you please provide the old URL that you were referring to?

Also, the link you have provided is working from our internal network. It gives a gateway link, which then lead to Web of Science. When we tried to access the URL out of our network, it leads to Web of Science Roaming page. This indicates may by the links need to be accessed from Web of Science authorized IP's.

In addition to that, please note that the OpenURL inbound syntax URL used by Stanford to access WoS does NOT assume access to CEL for non-subscribers

https://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/000430958200010

It is routed/resolved via Links Gateway with parameter DestApp=ALL_WOS designed to failover (for non-subscribers) to Web of Science log in/roaming page.

The issue reported may have to do with changed user and patrons entitlement to Web of Science.

In case Stanford needs failover to CEL for all links to WoS we can work with the university to provide new syntax (likely direct Gateway syntax) to be embedded in Stanford pages.

My response back to them:

>>>> Upon investigating further, we found that the URL you have provided is from Links AMR. Could you please provide the old URL that you were referring to?

So basically each publication in our Profile system links to the source publication in the Web of Science using the WOS identifier we get from the API. This link is formed by appending the identifier to a base URL. The base URL we have been using for literally years (like maybe even >8 years) is this:

https://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/<https://urldefense.proofpoint.com/v2/url?u=https-3A__ws.isiknowledge.com_cps_openurl_service-3Furl-5Fver-3DZ39.88-2D2004-26rft-5Fid-3Dinfo-3Aut_&d=DwMGaQ&c=OGmtg_3SI10Cogwk-ShFiw&r=LBjDXL8rrlp37-brrIT6dSJhR8FiPPPnq6mjJaD-VpU&m=E-25RTwsQ-NR0Tazm6ITRa0vPeouEFgP5uhHy5aiKTg&s=WK8bDBDmuIaf2GO2zBeDHhmXP_1rDjKlbXqVFsAxecU&e=>

After appending an identifier, we get something like this:

https://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/000081515000015<https://urldefense.proofpoint.com/v2/url?u=https-3A__ws.isiknowledge.com_cps_openurl_service-3Furl-5Fver-3DZ39.88-2D2004-26rft-5Fid-3Dinfo-3Aut_000081515000015&d=DwMGaQ&c=OGmtg_3SI10Cogwk-ShFiw&r=LBjDXL8rrlp37-brrIT6dSJhR8FiPPPnq6mjJaD-VpU&m=E-25RTwsQ-NR0Tazm6ITRa0vPeouEFgP5uhHy5aiKTg&s=hjQW58dQpfp_bDxsga8vtkLsVTfV20v9azuQGZDun68&e=>

This is the "old URL" I am referring to and the one we were expecting would still be resolving instead of landing on the generic log-in page. I understand this may be a page for subscribers only, but even on a full-tunnel Stanford VPN, it still lands me on your login page. This must be some change in behavior because it has been resolving for years up until now. I cannot test on campus since most everyone is still working from home, but my impression is that being on the Stanford VPN is the same from a network perspective as being on campus.

>>> In case Stanford needs failover to CEL for all links to WoS we can work with the university to provide new syntax (likely direct Gateway syntax) to be embedded in Stanford pages.

I'm not sure what you mean by this statement. Changing the base URL would unfortunately be a significant change for us.
peetucket commented 4 years ago

Response #2 from Clarivate:

Could you please check if you can access Web of Science directly from your VPN (www.webofknowledge.com). If you can, then those links should work.
If the issue still persists, it means the VPN is not entitled to Web of Science. In this instance you would need to get your VPN IP range added to your account.

Could you please try and let us know the outcome?

When on VPN, either full tunnel or regular, the link they gave us https://www.webofknowledge.com goes to the generic login page, I am letting them know that.

peetucket commented 4 years ago

From Grace:

I’m connected to VPN.  You should know that the libraries blocked the IP addresses to the web version for WoS due to security concerns.   This change was made before dual factor was implemented.   One option would be to have VPN access permitted for WoS.  So right now, VPN access doesn’t work.   I’m copying Irina who manages EZProxy access for the libraries.  Allowing VPN would certainly be less crazy than having to reload all of the records in Stanford Profiles to fix this “broken link” problem.

From a publication link in Stanford Profiles, I wasn’t able to access the WoS record from either FF or Chrome.  FF is set up with EZProxy and Chrome isn’t for me.  In FF, got a blank page that eventually rolled to the basic page.   In Chrome, everything stopped at a WoS login page. 

https://login.stanford.edu/idp/profile/SAML2/POST/SSO?execution=e1s2

Hope these clues help!
peetucket commented 4 years ago

More clues from Grace, and perhaps we authenticate differently now and need to prepend our URLs with a proxy URL:

from grace:

It is across campus for virtually all content licensed by SUL.  This includes all of the databases such as WoS, journals, ebooks, etc. I’m assuming that the full-text link-out is leveraging our library subscription to the web interface, not going to WoS through another route.

Just checked out catalog record and the web interface link is https://stanford.idm.oclc.org/login?url=http://webofknowledge.com/WOS  so perhaps the path and method to access the WoS data is different when starting from Stanford Profiles.   From the user perspective, the access in Stanford Profiles looked like “regular web access” a user would see if they searched the WoS interface directly. 

From me:

A HA.  Very interesting.  If I prepend the URLs we are currently making with the proxy link, it works (after going through auth)… like this:

https://stanford.idm.oclc.org/login?url=https://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/000430958200010

And you don't need to be on VPN either.  I wonder if this may give us an alternate path (though the work would fall on Profiles).  Basically they have to prepend that proxying link before the links we currently have each time…but they may be able to do that programmatically.

And from I understand, this is due to some changes at the Stanford level regarding security concerns?
mjgiarlo commented 4 years ago

I have removed this issue from the "open issues" sheet in the infra team prod tracker. Leaving it to @peetucket to close this issue once appropriate.

peetucket commented 4 years ago

Latest updates from Clarivate and the Library is that the links will not resolve anymore when on VPN via security design. The best path forward is for Profiles to programmatically add the proxy prepend URL (https://stanford.idm.oclc.org/login?url=) to the links when building the page (and make it a configuration parameter to ensure its easy to change in the future if needed). I'll track this issue as the work occurs with Profiles in case we need to take further action (which we should not). I don't think we want to manually update all database records on our end as this makes us vulnerable to the proxy URL changing again in the future.