nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.7k stars 29.1k forks source link

http request will hang forever #39341

Open liran opened 3 years ago

liran commented 3 years ago

Version

v15.10.0

Platform

Darwin Lis-iMac.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64

Subsystem

macos catalina 10.15.7

What steps will reproduce the bug?

const https = require('https');

async function main() {
  const options = {
    hostname: 'www.dhgate.com',
    port: 443,
    path: '/product/magnetic-liquid-eyeliner-magnetic-false-eyelashes/481362313.html',
    method: 'GET',
    headers: {
      accept:
        'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
      'accept-language': 'en',
      'user-agent':
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36',
    },
  };
  const req = https.request(options, (res) => {
    console.log('statusCode:', res.statusCode);
    console.log('headers:', res.headers);

    res.on('data', (d) => {
      process.stdout.write(d);
    });
  });
  req.on('error', (e) => {
    console.error(e);
  });
  req.end();
}

main();

How often does it reproduce? Is there a required condition?

No matter what system it will happen

What is the expected behavior?

receive the information returned by the server normally

What do you see instead?

hang forever

Additional information

I have tried many versions of nodejs, but none of them are normal.

AntonioWeb-dev commented 3 years ago

const myEmitter = new MyEmitter(); let m = 0; myEmitter.on('event', () => { console.log(++m); }); myEmitter.emit('event'); // Prints: 1 myEmitter.emit('event'); // Prints: 2

Try something like this

Flarna commented 3 years ago

If you add header "Connection": "keep-alive" or use an keep alive agent like new https.Agent({ keepAlive: true }); it works.

liran commented 3 years ago

If you add header "Connection": "keep-alive" or use an keep alive agent like new https.Agent({ keepAlive: true }); it works.

I tried it just now and I can receive a normal response.

I don't know what happened inside the system, but this method solved my problem. Thank you. In addition, is it a NodeJs bug? Because other links can be accessed normally except dhgate.com.

Flarna commented 3 years ago

I doubt that this is a Node.js bug.

In general it's recommended to use an HTTP agent with keep alive (maybe limit number of sockets) otherwise every request creates a new connection which is quite an overhead.

liran commented 3 years ago

Okay, then I will not close this issue for the time being. If you have any progress on this issue, you can close it at any time.

Flarna commented 3 years ago

Maybe my comment was missleading. I don't think it's a node.js bug. And using an HTTP agent is a task for the user, not node.js internally.

liran commented 3 years ago

I will close it. @Flarna thank you for your help.

liran commented 3 years ago

Sorry, I need to reopen the issue because the hang-forever has not been completely resolved. Using "Connection": "keep-alive" or use an keep alive agent has no effect on AWS EC2. The environment of ec2 is as follows:

# ec2 region
us-west-1

# system
ubuntu 18.04

# uname -a
Linux ip-172-31-8-151 5.4.0-1041-aws #43~18.04.1-Ubuntu SMP Sat Mar 20 15:47:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

# node -v
v15.12.0

The latest test code:

const https = require('https');

function main() {
  const options = {
    hostname: 'www.dhgate.com',
    port: 443,
    path: '/',
    method: 'GET',
    headers: {
      accept:
        'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
      'accept-language': 'en',
      'user-agent':
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36',
      // connection: 'keep-alive',
    },
    agent: new https.Agent({ keepAlive: true }),
  };

  const req = https.request(options, (res) => {
    console.log('statusCode:', res.statusCode);
    console.log('headers:', res.headers);

    res.on('data', (d) => {
      process.stdout.write(d);
    });
  });

  req.on('error', (e) => {
    console.error(e);
  });

  req.end();
}

main();

If necessary, I can record a video.

vinodkumartheking commented 3 years ago

Did you check any firewall/any policies stops the hit of the url from EC2 instance, check with other url [www.google.com]. I tried in Relplit.com with your code, it still it is not responding for www.dhgate.com [ May be they are blocking it in server side ] because i see this more kind of web scrapping . Its just a thought. dhgate might have some mechanism to give response only if the request is coming from browser. !

liran commented 3 years ago

@vinodkumartheking thank you for your reply.

The request from the browser can be disguised, and the user-agent in the code plays this role. I can be sure that it is not the reason for the firewall, because I used cURL for testing and it worked very well. Because of this problem, all nodejs-based http request libraries will hang. I finally used node-libcurl to solve my problem, because its underlying is not based on nodejs. I suspect this is a bug in nodejs.

vinodkumartheking commented 3 years ago

@liran Great finding then, might be a bug in nodejs. need to explore more then. Thanks for the update

mohd-akram commented 2 years ago

I've encountered this same issue:

require('https').get('https://uae.voxcinemas.com', console.log);

EDIT: This is a problem with the server - curl -A "" --http1.1 'https://uae.voxcinemas.com'