zotero / translation-server

A Node.js-based server to run Zotero translators
Other
121 stars 50 forks source link

Cloudflare causing HTTP403 errors for certain URLs #110

Closed noctux closed 4 years ago

noctux commented 4 years ago

Cloudflare's "security" features are blocking some of translation-server's requests as it seems, for instance when trying to access the url [https://dl.acm.org/citation.cfm?id=2038619](). (This might be subjective to my IP address/installation, but happend for several acm urls on 3 different IPs I was using).

(3)(+0002000): HTTP GET https://dl.acm.org/citation.cfm?id=2038619

(1)(+0000156): Error: HTTP request to https://dl.acm.org/citation.cfm?id=2038619 rejected with status 403

  InternalServerError: An error occurred retrieving the document
      at Object.throw (/home/noctux/projects/translation-server/node_modules/koa/lib/context.js:97:11)
      at module.exports.WebSession.handleURL (/home/noctux/projects/translation-server/src/webSession.js:220:19)
      at processTicksAndRejections (internal/process/task_queues.js:93:5)

The following patch fixes the issue (cloudflare has to provide tremendous security benifits by doing that sort of blocking...):

diff --git a/src/http.js b/src/http.js
index ea02eb5..3e3dec4 100644
--- a/src/http.js
+++ b/src/http.js
@@ -98,7 +98,8 @@ Zotero.HTTP = new function() {
        }, options);

        options.headers = Object.assign({
-           'User-Agent': config.get('userAgent')
+           'User-Agent': config.get('userAgent'),
+           'Accept': '*/*'
        }, options.headers);

        let logBody = '';

However, I do not know what a sane value for Accept would be here. But as most urls are actually landing page urls/pdf artifacts and therefore whatever zotero expects as the "natural mimetype", we should probably be fine with */*. I guess the problem is not limited to acm alone, but probably to all providers using Cloudflare to "guard" their websites.

dstillman commented 4 years ago

Thanks so much — this works great. (We had noticed the problem with ACM in particular but hadn't tracked it down.)