zotero / translation-server

A Node.js-based server to run Zotero translators
Other
123 stars 52 forks source link

Redirect / Translator selection problem (JavaScript, www.redi-bw.de => ebscohost) #49

Open mtrojan-ub opened 6 years ago

mtrojan-ub commented 6 years ago

URL: http://www.redi-bw.de/db/ebsco.php/search.ebscohost.com/login.aspx%3fdirect%3dtrue%26db%3dreh%26AN%3dATLAiGFE171113003879%26site%3dehost-live

Result: [{"key":"8VGWTYLW","version":0,"itemType":"webpage","url":"http://web.b.ebscohost.com/plink?key=10.81.11.197_8000_1381733013&scope=site&db=reh&AN=ATLAiGFE171113003879&site=ehost-live","title":"","accessDate":"2018-11-07T14:46:22Z"}]

If you open the above URL in Browser, there is a redirect to this URL: http://web.a.ebscohost.com/ehost/detail/detail?vid=0&sid=210b94b5-d4c1-41a3-bc71-a267c3a20ce2%40sessionmgr4007&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d#AN=ATLAiGFE171113003879&db=reh

If you use web endpoint on this URL directly, you get the following result: [{"key":"M9FHFV2Q","version":0,"itemType":"journalArticle","creators":[{"firstName":"Evgenia","lastName":"Fotiou","creatorType":"author"},{"firstName":"Diana","lastName":"Riboli","creatorType":"author"},{"firstName":"Davide","lastName":"Torri","creatorType":"author"},{"firstName":"Dimitra Mari","lastName":"Varvarezou","creatorType":"author"}],"tags":[{"tag":"International Society for Academic Research on Shamanism","type":1},{"tag":"Shamanism -- Study and teaching","type":1},{"tag":"Animism","type":1},{"tag":"Peer reviewed","type":1}],"title":"The First Conference of the International Society for Academic Research on Shamanism (ISARS), Delphi, Greece, in 2015","date":"2017","journalAbbreviation":"Shaman","volume":"25","issue":"1-2","pages":"5-14","ISSN":"1216-7827","libraryCatalog":"EBSCOhost","publicationTitle":"Shaman"}]

Is there a redirect problem?

mtrojan-ub commented 6 years ago

This might be a bad example because the redirect is based on the request source IP, so you will not be able to reproduce it from your location. I will provide additional information how the redirect works.

mtrojan-ub commented 6 years ago

curl -v http://www.redi-bw.de/db/ebsco.php/search.ebscohost.com/login.aspx%3fdirect%3dtrue%26db%3dreh%26AN%3dATLAiGFE171113003879%26site%3dehost-live

*   Trying 129.143.8.99...
* TCP_NODELAY set
* Connected to www.redi-bw.de (129.143.8.99) port 80 (#0)
> GET /db/ebsco.php/search.ebscohost.com/login.aspx%3fdirect%3dtrue%26db%3dreh%26AN%3dATLAiGFE171113003879%26site%3dehost-live HTTP/1.1
> Host: www.redi-bw.de
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 302 Found
< Date: Wed, 07 Nov 2018 15:52:29 GMT
< Server: Apache
< Location: http://www-fr.redi-bw.de/db/ebsco.php/search.ebscohost.com/login.aspx?direct=true&db=reh&AN=ATLAiGFE171113003879&site=ehost-live
< Content-Length: 324
< Content-Type: text/html; charset=iso-8859-1
< 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www-fr.redi-bw.de/db/ebsco.php/search.ebscohost.com/login.aspx?direct=true&amp;db=reh&amp;AN=ATLAiGFE171113003879&amp;site=ehost-live">here</a>.</p>
</body></html>
* Connection #0 to host www.redi-bw.de left intact

I suppose this kind of redirect is not supported by zotero translation server yet?

dstillman commented 6 years ago

Redirects should be handled fine. What's the debug output?

mtrojan-ub commented 6 years ago
translation-server@2.0.0 start /app
node src/server.js

(3)(+0000000): Translators initialized with 546 loaded
(3)(+0000005): Listening on 0.0.0.0:1969
(3)(+0017564): HTTP GET http://www.redi-bw.de/db/ebsco.php/search.ebscohost.com/login.aspx%3fdirect%3dtrue%26db%3dreh%26AN%3dATLAiGFE171113003879%26site%3dehost-live
(3)(+0001713): Translators: Looking for translators for http://web.a.ebscohost.com/plink?key=10.83.8.64_8000_1796661229&scope=site&db=reh&AN=ATLAiGFE171113003879&site=ehost-live
(4)(+0000017): Translate: Binding sandbox to http://web.a.ebscohost.com/plink?key=10.83.8.64_8000_1796661229&scope=site&db=reh&AN=ATLAiGFE171113003879&site=ehost-live
(4)(+0000001): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2018-05-12 15:58:17)
(4)(+0000004): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2015-06-04 03:25:10)
(4)(+0000001): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-11-01 19:46:46)
(3)(+0000003): Translate: Embedded Metadata: found 0 meta tags.
(4)(+0000000): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2016-11-05 10:57:01)
(3)(+0000002): Translate: All translator detect calls and RPC calls complete:
(3)(+0000000):  No suitable translators found
(5)(+0000000): Translate: Running handler 0 for translators
(3)(+0000000): No translators found -- saving as a webpage
(5)(+0000003): Translate: Running handler 1 for translators

I see... so could this be a translator selection problem in combination with the redirect?

dstillman commented 6 years ago

Since we can't access this, you're going to have to debug this on your end. From the above debug output, it's trying to translate the /plink page, which isn't the same page /ehost page you mention above. You'll have to compare that to what you get from curl -v -L and possibly view page content from translation-server code.

mtrojan-ub commented 6 years ago

I will try. It would be great if you could have a look at #48. Seems to be a similar problem, but with a test case that can be reproduced anywhere.

mtrojan-ub commented 6 years ago

It seems like there are multiple redirects happening here. Redirection from redi-bw.de to the plink page works fine. The plink page itself contains a JS redirect:

<!DOCTYPE html><html><head>
<script type='text/javascript' src='http://if.ebsco-content.com/interfacefiles/17.108.0.2366/javascript/json2/json2.js'></script>
<script type='text/javascript' src='http://if.ebsco-content.com/interfacefiles/17.108.0.2366/javascript/polyfill/sessionstorage.js'></script>

<script type='text/javascript'>
    window.onload = function () {

var aAuthStorage = 'dGJyMLOr4Xqv199Iq621UeSjskrgqqtRs67fReOmtE6v2rdRr9u2Tb7p44vx3+2G693wSa6p';
var aAuthStorageKey = 'gbit.s9512272.main.ehost.Or4Xq';
localStorage.setItem(aAuthStorageKey, aAuthStorage);
sessionStorage.setItem(aAuthStorageKey, aAuthStorage);
window.location.replace('ehost/detail/detail?vid=0&sid=55cb1aa0-779f-42b4-958a-e0661d991e85%40sessionmgr103&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d');
}
</script></head><body></body></html>

What would be Best Practice here? To write a translator which handles the redirect and calls the correct translator on the target URL?