node-modules / agentkeepalive

Support keepalive http agent.
MIT License
579 stars 57 forks source link

[BUG] freeSocketTimeout is used when an active request is being made #106

Open making3 opened 2 years ago

making3 commented 2 years ago

I've been troubleshooting an intermittent issue this week where many developers and our build pipeline will have NPM start erroring with ERR_SOCKET_TIMEOUT. I've dug in quite a bit and found ERR_SOCKET_TIMEOUT seems to be thrown by this library (which is used internally by the NPM client).

Unfortunately, the only suggestions with ERR_SOCKET_TIMEOUT seem to have no affect on the actual issue which is increasing several NPM configs (fetch-retry-maxtimeout and so on). However, with this issue, none of these options seem to matter.

There are several related issues in the NPM client directly, but I think this still exists in agentkeepalive. Related issues:

Current Behavior

agentkeepalive throws ERR_SOCKET_TIMEOUT even though my timeout is set to 60 seconds.

See the Output from the making3/agentkeepalive-timeout-issue for example output.

Expected Behavior

agentkeepalive does not error with ERR_SOCKET_TIMEOUT within my specified timeout (60 seconds).

Steps to Reproduce

  1. Clone the following repo - https://github.com/making3/agentkeepalive-timeout-issue
  2. Run npm ci
  3. Run one of the following:
DEBUG=agentkeepalive node slow-client.js
DEBUG=agentkeepalive node slow-server.js

Environment

npm: 8.18.0 Node.js: 18.7.0 OS Name: macOS Catalina Version 10.15.7 System Model Name: Macbook Pro npm config: ; "user" config from /Users/namehere/.npmrc

//localhost:4873/:_authToken = (protected) cache = "/Users/namehere/.npm" strict-ssl = true

; node bin location = /Users/namehere/.nvm/versions/node/v18.7.0/bin/node ; node version = v18.7.0 ; npm local prefix = /private/tmp/agentkeepalive-timeout-issue ; npm version = 8.18.0 ; cwd = /private/tmp/agentkeepalive-timeout-issue ; HOME = /Users/namehere ; Run npm config ls -l to show all defaults.

making3 commented 2 years ago

To pull in some of the debug logs and to hopefully make sense of them:

  agentkeepalive sock[0#0.0.0.0:8089:] create, timeout 60000ms +0ms
Responding with redirect to /fake-path
  agentkeepalive sock[0#0.0.0.0:8089:](requests: 1, finished: 1) free +11ms
  agentkeepalive sock[0#0.0.0.0:8089:](requests: 2, finished: 1) will be reuse on agent free event +0ms
Responding normally
  agentkeepalive sock[0#0.0.0.0:8089:](requests: 2, finished: 1) timeout after 800ms, listeners 4, defaultTimeoutListenerCount 3, hasHttpRequest true, HttpRequest timeoutListenerCount 0 +805ms
  agentkeepalive timeout listeners: onTimeout, onTimeout, emitRequestTimeout, responseOnTimeout +1ms
  agentkeepalive sock[0#0.0.0.0:8089:] destroy with timeout error +1ms
  agentkeepalive sock[0#0.0.0.0:8089:](requests: 2, finished: 1) error: Error: Socket timeout

From my understanding, the following sequence of events happen:

  1. A socket is created (sock[0#0.0.0.0:8089:0]).
  2. The HTTP Server responds.
  3. The 'free' event is emitted (since this is undocumented, I have no idea what condition that causes this to get emitted).
  4. Because something else is still requesting information (a redirect in this case), a new request is added to this socket, however, the timeout looks to be set to the freeAgentTimeout.
  5. Because my server (or client) doesn't finish within 800ms, ERR_SOCKET_TIMEOUT is thrown, as the timeout is actually 800ms and the socket tried to close with an active request.

While I don't have a clear picture if our server is slow or client is slow, the 5 minute default fetch-timeout from NPM should ideally still be used, I believe.