sindresorhus / got

🌐 Human-friendly and powerful HTTP request library for Node.js
MIT License
14.27k stars 935 forks source link

Got cannot open specific website while Request (and browsers) can #1639

Closed papb closed 3 years ago

papb commented 3 years ago

Describe the bug

Got is failing to get the following specific website: https://www.ansys.com

However, to my surprise, the wget and curl commands also fail to fetch it. Maybe something is wrong with this particular website in a very weird way. But since Request can do it, I think Got should do it as well.

I found a related question on StackOverflow, but reading it didn't help me.

Note: not sure if it matters, but I am at home, not behind a proxy (that I know of - unless my ISP uses one somehow). The websites http://www.amibehindaproxy.com/ and http://ipv4.amibehindaproxy.com/ also claim that I do not seem to be behind a proxy.

I also looked into Migrating from Request but couldn't see anything that might explain my situation.

Note: the same problem also happens with the HTTP version: http://www.ansys.com

Actual behavior

// This times out.
await got('https://www.ansys.com', { timeout: 20000 });
// (node:95) UnhandledPromiseRejectionWarning: RequestError: Timeout awaiting 'request' for 20000ms

Expected behavior

Should fetch the page successfully.

Code to reproduce

const got = require('got');

(async () => {
    await got('https://www.ansys.com', { timeout: 20000 });
})();

Extra info

Behavior from other tools:

# GNU Wget 1.20.1 built on linux-gnu.
$ wget www.ansys.com
--2021-02-25 03:02:17--  http://www.ansys.com/
Resolving www.ansys.com (www.ansys.com)... 23.197.250.253
Connecting to www.ansys.com (www.ansys.com)|23.197.250.253|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2021-02-25 03:02:29--  (try: 2)  http://www.ansys.com/
Connecting to www.ansys.com (www.ansys.com)|23.197.250.253|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

[CTRL+C]
# curl 7.64.0 (x86_64-pc-linux-gnu) libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
# Release-Date: 2019-02-06
$ curl www.ansys.com
curl: (52) Empty reply from server
// Request 2.88.2
const request = require('request');
request('http://www.ansys.com', function (error, response, body) {
  console.error('error:', error);
  console.log('statusCode:', response && response.statusCode);
  console.log('body:', body);
});
// Prints the whole HTML landing page normally

Tested with

Checklist

sindresorhus commented 3 years ago

If curl fails, there's definitely something wrong with the webserver. I'm not surprised browsers work. They are extremely lenient and allow a lot of incorrect behavior.

papb commented 3 years ago

Hi @sindresorhus, thanks for the reply! Still, in this case, I find particularly surprising that Got times out. If there is something wrong with the server but browsers manage to be lenient with the response, I would expect another error, not a timeout.

Do you have any idea on what exactly could Request be doing differently from Got here?

szmarczak commented 3 years ago

I've just done curl https://www.ansys.com and I'm getting a timeout.

szmarczak commented 3 years ago

I'm pretty sure they have some anti-bot mechanism and it blocks the requests.