tomas / needle

Nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support.
https://www.npmjs.com/package/needle
MIT License
1.62k stars 235 forks source link

Tunnelling doesn't work in v3.1.0 #406

Open pkey opened 2 years ago

pkey commented 2 years ago

Description

With the new version of needle (3.1.0) the current approach (here) to set up tunnelling doesn't work anymore (version 3.0.0 works fine). Scanning the network using Wireshark shows that needle ends up looping CONNECT requests to the proxy. I suspect that needle tries to use both the tunnel agent while also passing proxy parameters from the environment (HTTP_PROXY and HTTPS_PROXY) and thus ends up in this weird state.

We were previously using global agent together with _PROXY environment variables to force needle to do CONNECT requests but since the new version that also doesn't work (which might be a different issue)

How to reproduce

  1. setup local mtimproxy (or any other proxy)
  2. set up tunnelling using the tunnel agent as describe in the documentation, pointing proxy and port to the mtimproxy.
  3. set up HTTP_PROXY and HTTPS_PROXY environment variables to point to the same proxy
  4. try and make a network call using needle

Expected behaviour

When HTTP_PROXY and HTTPS_PROXY is set, and tunnel is configured, needle should make a CONNECT request to the proxy and establish a tunnel

Actual behaviour

needle tries to establish CONNECT request but doesn't succeed

tomas commented 2 years ago

Hi and thanks for the detailed bug report. Would it be possible to see a small code snippet so I can reproduce the error quickly?

pkey commented 2 years ago

Here's the code snippet:

var needle = require('needle');
var tunnel = require('tunnel');
var myAgent = tunnel.httpOverHttp({
  proxy: { host: '127.0.0.1', port: 8080 }
});

needle.get('https://github.com/status', {agent: myAgent} ,function (error, response) {
        if (!error && response.statusCode == 200)
                console.log(response.body);
        else console.log(error);
});

Make sure to npm install needle tunnel and then set _PROXY environment variables to point to the same proxy (in my case127.0.0.1:8080) as the agent configuration above.

When I run this with needle version 3.1.0 installed, in Wireshark, I see attempts to CONNECT 127.0.0.1:8080 HTTP/1.1 and HTTP/1.1 502 Bad Gateway (text/html) whereas with version 3.0.0, I can see CONNECT github.com:443 HTTP/1.1 and HTTP/1.1 200 Connection established - which is what I would expect. Mind that both result in Server disconnected but the symptoms are the same ones we are experiencing in our own system so I think this is a good example.

Let me know how it goes, I will also try and debug though I am not very familiar with the codebase of needle.

neonnoon commented 1 year ago

Let me add a bit more context and details.

What we are trying to achieve is to use needle with an HTTP/S proxy. In secure setup, HTTP clients are expected to send proxied requests for HTTPS resources through a HTTP CONNECT tunnel.

As far as I understand, needle doesn't support CONNECT requests. Therefore we are using https://github.com/gajus/global-agent to patch node's http agent to provide CONNECT-capable proxy support. There's similar libraries like https://github.com/koichik/node-tunnel (deprecated) or https://github.com/TooTallNate/node-http-proxy-agent.

The introduction of https://github.com/tomas/needle/pull/382 picks up HTTP_PROXY/HTTPS_PROXY from environment variables and does not allow needle to be used without proxy if those environment variable are present.

We're looking for a way to opt-out of needle picking up proxy configuration from environment variables.

neonnoon commented 10 months ago

@tomas , any chance you could have a look at this, especially PR #427 as a suggestion to disable this behaviour?

tomas commented 10 months ago

Yes, sorry. I'll make some time this week to take a look into this. :)

neonnoon commented 10 months ago

Thanks @tomas, no rush, just wanted to make sure you've seen it. Let me know if there's anything you'd like me to change in the PR.

neonnoon commented 9 months ago

Hey @tomas , any chance you could have a look at #427 to optionally disable needle from automatically picking up environment variables?

dklimpel commented 8 months ago

During the analysis and debugging, I noticed that no consistent distinction is made between http_proxy and https_proxy. It is sufficient if one of the two is set, then this is used for all connections. If both are set, the http_proxy is used.

dklimpel commented 8 months ago

My test snipped:

var needle = require('needle');
needle.get('https://github.com/status', function (error, response) {
        if (!error && response.statusCode == 200)
                console.log(response.body);
        else console.log(error);
});

The results (via proxy: export HTTPS_PROXY=http://localhost:8888):

squid tinyproxy
HTTPS websites ⚠️
HTTP sites

For HTTPS pages, the connection goes through the tinyproxy, but the proxy tries to connect the destination via HTTP. An attempt with CURL through the tinyproxy works without problems.

Some return values:

error from squid

CacheErrorInfo - ERR_READ_ERROR&body=CacheHost: d4e570ebcbe2
ErrPage: ERR_READ_ERROR
Err: [none]
TimeStamp: Fri, 29 Dec 2023

ClientIP: 10.10.x.x
ServerIP: github.com

HTTP Request:
GET /status HTTP/1.1
Accept: */*
User-Agent: Needle/3.3.0 (Node.js v18.17.1; linux x64)
Host: github.com
Connection: close

some output from node:

_header: 'GET https://github.com/status HTTP/1.1\r\n' +
  'accept: */*\r\n' +
  'user-agent: Needle/3.3.0 (Node.js v18.17.1; linux x64)\r\n' +
  'host: github.com\r\n' +
  'Connection: close\r\n' +
  '\r\n',
method: 'GET',
path: 'https://github.com/status',
host: 'localhost',
protocol: 'http:',
statusCode: 502,
statusMessage: 'Bad Gateway',

I cannot see how is send the CONNECT request.

curl example

curl https://github.com/status -v
* Uses proxy env variable HTTPS_PROXY == 'http://localhost:8888'
*   Trying 127.0.0.1:8888...
* Connected to (nil) (127.0.0.1) port 8888 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to github.com:443
> CONNECT github.com:443 HTTP/1.1
> Host: github.com:443
> User-Agent: curl/7.81.0
> Proxy-Connection: Keep-Alive
dklimpel commented 8 months ago

I have found a workaround for me:

var { ProxyAgent } = require('proxy-agent');
var needle = require('needle');
needle.get('https://github.com/status',{ agent: new ProxyAgent(), use_proxy_from_env_var: false }, function (error, response) {
        if (!error && response.statusCode == 200)
                console.log(response.body);
        else console.log(response);
});
neonnoon commented 8 months ago

This is a similar setting we're using needle. We're using proxy-agent or global-agent, but the earlier changes for needle to pick up the env variables broke this.

@tomas with use_proxy_from_env_var: false implemented, in my opinion, we can close this issue. @dklimpel happy to leave this open if your case isn't fully covered yet.

dklimpel commented 8 months ago

IMHO this is open. Needle supports:

  • HTTP Proxy forwarding, optionally with authentication

And that is not the case. There is no support for https_proxy at the moment.