zotero / translation-server

A Node.js-based server to run Zotero translators
Other
117 stars 48 forks source link

HTTP GET for BigThink URLs of web endpoint is excessively slow #63

Closed dhimmel closed 5 years ago

dhimmel commented 5 years ago

We noticed that we were getting intermittent 504 Gateway Time-out errors for some translation-server queries in https://github.com/greenelab/manubot/pull/87#issuecomment-447393765. We've configured translation-server behind an nginx reverse proxy. When the error occurs, the response from https://translate.manubot.org is:

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.14.0 (Ubuntu)</center>
</body>
</html>

This can be triggered by running the following command many times:

curl \
  --header "Content-Type: text/plain" \
  --data 'https://bigthink.com/neurobonkers/a-pirate-bay-for-science' \
  'https://translate.manubot.org/web'

Here is are the log output from the translation-server (comment indicates where the delay happens, when we watch live with tail --follow):

(3)(+0038384): HTTP GET https://bigthink.com/neurobonkers/a-pirate-bay-for-science
## Big time delay here, node using 99% of CPU during this delay
(3)(+0025766): Translators: Looking for translators for https://bigthink.com/neurobonkers/a-pirate-bay-for-science

If I run curl https://bigthink.com/neurobonkers/a-pirate-bay-for-science locally or on our translation-server's system, there is no delay getting the response. Therefore, I don't know why HTTP GET is taking so long when performed by translation-server. Perhaps something else is going on that is causing the delay? CCing @dongbohu from our team.

dhimmel commented 5 years ago

Other BigThink URLs also seem slow, although perhaps not as slow, when queried via translation-server (but not via other means):

curl \
  --header "Content-Type: text/plain" \
  --data 'https://bigthink.com/politics-current-affairs/new-estimate-deaths-from-us-nuclear-tests' \
  'https://translate.manubot.org/web'
dongbohu commented 5 years ago

The default timeout value on Nginx is 60 seconds. We have increased it to 300 seconds and see whether the timeout error will be gone.

dongbohu commented 5 years ago

src/webSession.js sets a variable SERVER_TRANSLATION_TIMEOUT to 30, but it doesn't seem to be used anywhere.

dhimmel commented 5 years ago

When I run a translation-server locally, and then run:

curl \
  --header "Content-Type: text/plain" \
  --data 'https://bigthink.com/neurobonkers/a-pirate-bay-for-science' \
  'http://127.0.0.1:1969/web'

The response is slow but not that slow... perhaps like 5 seconds which is acceptable compared to 60+ seconds we were seeing on our remote server.

dstillman commented 5 years ago

Make sure you're using the latest version of translation-server.

Translating that page takes 0.6 seconds for me.

$  time ./translate_url 'https://bigthink.com/politics-current-affairs/new-estimate-deaths-from-us-nuclear-tests'
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 1969 (#0)
> POST /web HTTP/1.1
> Host: 127.0.0.1:1969
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Type: text/plain
> Content-Length: 87
>
* upload completely sent off: 87 out of 87 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Content-Length: 452
< Date: Fri, 14 Dec 2018 22:41:05 GMT
< Connection: keep-alive
<
* Connection #0 to host 127.0.0.1 left intact
[{"key":"56PVEUX8","version":0,"itemType":"webpage","creators":[],"tags":[],"title":"At least 340,000 Americans died from radioactive fallout between 1951 and 1973","websiteTitle":"Big Think","date":"2018-12-14T16:32:16","url":"https://bigthink.com/politics-current-affairs/new-estimate-deaths-from-us-nuclear-tests","abstractNote":"Domestic nuclear testing wreaked havoc on thousands of families.","language":"en","accessDate":"2018-12-14T22:41:05Z"}]
real    0m0.687s
user    0m0.006s
sys 0m0.006s
dhimmel commented 5 years ago

I updated to the latest translation-server, using the following command (& output):

translate@translation-server:~/translation-server$ git pull --ff-only --recurse-submodules
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 4 (delta 3), reused 3 (delta 3), pack-reused 0
Unpacking objects: 100% (4/4), done.
From https://github.com/zotero/translation-server
   3c80836..cd8b177  master     -> origin/master
Fetching submodule modules/translators
From https://github.com/zotero/translators
   f9de522..5e4b941  master     -> origin/master
Fetching submodule modules/zotero
Fetching submodule modules/zotero/chrome/content/zotero/locale/csl
Fetching submodule modules/zotero/styles
Fetching submodule modules/zotero/translators
From git://github.com/zotero/translators
   f9de522..5e4b941  master     -> origin/master
Updating 3c80836..cd8b177
Fast-forward
 package-lock.json | 6 +++---
 package.json      | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

I restarted translation-server. Then I ran:

time ./translate_url 'https://bigthink.com/politics-current-affairs/new-estimate-deaths-from-us-nuclear-tests'
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 1969 (#0)
> POST /web HTTP/1.1
> Host: 127.0.0.1:1969
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Type: text/plain
> Content-Length: 87
> 
* upload completely sent off: 87 out of 87 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Content-Length: 452
< Date: Sat, 15 Dec 2018 00:50:19 GMT
< Connection: keep-alive
< 
* Connection #0 to host 127.0.0.1 left intact
[{"key":"NMX64QGU","version":0,"itemType":"webpage","creators":[],"tags":[],"title":"At least 340,000 Americans died from radioactive fallout between 1951 and 1973","websiteTitle":"Big Think","date":"2018-12-14T16:32:16","url":"https://bigthink.com/politics-current-affairs/new-estimate-deaths-from-us-nuclear-tests","abstractNote":"Domestic nuclear testing wreaked havoc on thousands of families.","language":"en","accessDate":"2018-12-15T00:50:19Z"}]
real    0m12.638s
user    0m0.007s
sys 0m0.008s

12 seconds total, so I think the problem is still there.

dstillman commented 5 years ago

Did you run npm i?

dhimmel commented 5 years ago

Running npm install fixed it. The output of that command was:

added 7 packages from 10 contributors, removed 16 packages, updated 10 packages and audited 450 packages in 9.689s
found 0 vulnerabilities

Now both BigThink URLs complete in ~1 second.

Interesting that the having outdated dependencies would cause translation-server to slow down but still work. Thanks @dstillman for the insight. I've updated our server update instructions at https://github.com/greenelab/manubot/issues/82#issuecomment-447037867 to include the npm install command.

dhimmel commented 5 years ago

Post update, our CI tests showed that the following web query is no longer working:

curl --verbose \
 --header "Content-Type: text/plain" \
 --data 'https://nyti.ms/1NuB0WJ' \
 'https://translate.manubot.org/web'

Here is the corresponding translation-server stdout log:

(3)(+0197565): HTTP GET https://nyti.ms/1NuB0WJ

(1)(+0015219): Error: ESOCKETTIMEDOUT

    Error: ESOCKETTIMEDOUT
        at ClientRequest.<anonymous> (/home/translate/translation-server/node_modules/request/request.js:812:19)
        at Object.onceWrapper (events.js:313:30)
        at emitNone (events.js:106:13)
        at ClientRequest.emit (events.js:208:7)
        at TLSSocket.emitTimeout (_http_client.js:711:34)
        at Object.onceWrapper (events.js:313:30)
        at emitNone (events.js:106:13)
        at TLSSocket.emit (events.js:208:7)
        at TLSSocket.Socket._onTimeout (net.js:420:8)
        at ontimeout (timers.js:482:11)
dstillman commented 5 years ago

Interesting that the having outdated dependencies would cause translation-server to slow down but still work.

It was using an older JSDOM that was much slower.

That URL works for me in 2.4 seconds.

dhimmel commented 5 years ago

The NYT URL command from https://github.com/zotero/translation-server/issues/63#issuecomment-447530844 now works for me. Perhaps it was an upstream error or we had exceeded some NYT rate limit.