postwait / node-amqp

[UNMAINTAINED] node-amqp is an AMQP client for nodejs
MIT License
1.69k stars 357 forks source link

Connection.end() causes an emit of ECONNRESET #300

Closed OJezu closed 7 years ago

OJezu commented 10 years ago

When I run some of the tests, e.g

node test/test-basic-return.js

I get

/home/ch/node_modules/amqp/node_modules/longjohn/dist/longjohn.js:184
        throw e;
              ^
Error: read ECONNRESET
    at errnoException (net.js:901:11)
    at onread (net.js:556:19)
---------------------------------------------
    at module.exports.run (/home/ch/node_modules/amqp/test/harness.js:49:23)
    at Object.<anonymous> (/home/ch/node_modules/amqp/test/test-basic-return.js:1:84)
    at Module._compile (module.js:456:26)
    at Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Module._load (module.js:312:12)
    at Module.runMain (module.js:497:10)
    at startup (node.js:119:16)

It seems it is connected to #295 and #246, however the code from #248 does not solve it.

It is a hell of a bug, because it caused me to look for a few hours for a bug in exchange.publish, and then it turned out it is some networking stuff happening upon connection.end().

djhojd commented 10 years ago

+1, same here

theartoflogic commented 10 years ago

I am getting the same error, however, it's limited to only the servers in my prod environment. My local environment, dev, and stage all work fine. However, it's 100% reproducible on the prod servers. The interesting thing is that they all have the same server setup (except my local env).

I have tried using the latest stable version of the node-amqp library (that does not include the explicit connectionCloseOk handler), as well as with the connectionCloseOk handler and both have the same issue.

After using node-inspector to step through the entire execution flow (very, very tediously) I still could not figure out exactly why it was throwing the ECONNRESET error. The connectionClose event to the rabbit server gets sent fine, and I get the connectionCloseOk response from the server. However, once it calls socket.end() after that it always throw this exception.

If I use netstat to see how many sockets are open to the rabbit server I can see that the socket has not been closed, so eventually rabbit starts blocking the connections as my node-pool continues to create new connections (thinking the old ones were closed properly).

I've been able to fix the ECONNRESET error by modifying the call to socket.end() to be socket.destroy() instead after it receives the connectionCloseOk response. With that change I no longer get the ECONNRESET error, and the socket is properly closed. However, I'm not sure if that's the proper way to close the socket, or if there may be other unintended consequences that arise from that change, but so far I haven't noticed any ill effects and it's been running in my prod environment for the past few weeks.

Here is some information on the environment where this issue happens:

OS: Debian Squeeze (v6.0.8) Node: v0.10.5 RabbitMQ: v3.1.5 node-amqp: v0.1.8

jgato commented 10 years ago

I have a similar issue using node v0.10.X but it disappears if I use node 0.8. I am not saying this is a solution, but...

rafaelkaufmann commented 10 years ago

+1 for this issue on CentOS 6.3, Node 0.10.22, RabbitMQ 3.2.3, node-amqp v0.1.8. Also, I did not get the error on my development machine, which is a Mac OS Mavericks installation with same configurations otherwise.

rafaelkaufmann commented 10 years ago

@theartoflogic: you should be aware that socket.destroy() does not actually end the connection. I tried to use @glenjamin 's disconnect branch, which features a connectionCloseOk handler, and replaced socket.end() with socket.destroy() there. This did shut up the exception, but a look at rabbitmqadmin list connections afterwards will show you that the connection still exists as far as the Rabbit server is concerned. These dangling connections can very well build up...

theartoflogic commented 10 years ago

@rafaelkaufmann the socket.destroy() call, for me at least, does seem to close the socket. I did the same thing...I used a branch with the connectionCloseOk handler, changed socket.end() to socket.destroy() and everything started working properly: there was no longer an exception, the socket did not show up when using netstat -an | grep 5672, and the RabbitMQ admin console showed that the connection was no longer open.

I'm not sure why it wouldn't work for you, but I'd chalk it up to the weirdness of this bug in general. It seems to vary under what circumstances it happens, and it seems that it varies under which conditions it's "fixed" as well. For now, it's working for me and that's the most important thing. I'm still hoping for an actual fix for it though. I just wanted to present my findings as far as what worked for me in case it can help anyone else in the short term.

rafaelkaufmann commented 10 years ago

That's truly bizarre @theartoflogic. Thanks for sharing. :+1:

OJezu commented 10 years ago

It seems that RabbitMQ responds to TCP FIN packet with RST packet:

 35 1.061573000    127.0.0.1 -> 127.0.0.1    TCP 66 47322 > amqp [FIN, ACK] Seq=356 Ack=512 Win=32768 Len=0 TSval=123271910 TSecr=123271909
 36 1.061735000    127.0.0.1 -> 127.0.0.1    TCP 66 amqp > 47322 [RST, ACK] Seq=512 Ack=357 Win=32768 Len=0 TSval=123271910 TSecr=123271910

It happens no matter if the node-amqp is from master, or from #248. It can either be a bug or working-as-intented on RabbitMQ side. I would file a bug upstream, but no idea where their bug tracker is (mailing list?)

glenjamin commented 10 years ago

My branch doesn't stop other bits of the system from trying to send data once the socket is closed - I was a bit lazy there.

Say you have a continuous publish(), then you trigger a close - the publish that happens after socket.end() / connectionCloseOk will cause the exception described above.

Possible Solutions:

  1. Make sure your app stops trying to send data after a close
  2. Extend the branch to make node-amqp throw a more obvious error in this scenario
  3. Call socket.end() and socket.destroy() in the handler - this should stop data being sent after the close think.
rafaelkaufmann commented 10 years ago

@glenjamin, I wound up fleeing from the problem :) Meaning I refactored my app to be less dependent on the details of connection management...

OJezu commented 10 years ago

@glenjamin it is not problem of app sending data after close. It seems that RabbitMQ responds (on TCP level) with RST to client's FIN. I did some tracing, and there is no packet sent to RabbitMQ after amqp's Close. RST may be intentional, i.e. RabbitMQ does not wish to do full for way close, but cuts the connection short with RST. If that is the case, then probably something should handle and quiet ECONNRESET after closeOk is received.

pruthvireddypuresoftware commented 6 years ago

test/test-basic-return.js: Error: read ECONNRESET at TCP.onread (net.js:660:25)

at Object.run (/node/node-amqp/test/harness.js:49:23)
at Object.<anonymous> (/node/node-amqp/test/test-basic-return.js:1:22)
at Module._compile (internal/modules/cjs/loader.js:689:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
at Module.load (internal/modules/cjs/loader.js:599:32)
at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
at Function.Module._load (internal/modules/cjs/loader.js:530:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
at startup (internal/bootstrap/node.js:266:19)

Facing above issue in debian, nodejs v10.8 and RabbitMQ 3.3.5. Downloaded the current commit and connection.js file has destroy function called. case methods.connectionCloseOk: debug && debug("Received close-ok from server, closing socket"); this.socket.end(); this.socket.destroy(); break;

Please suggest me any other solution. Thanks.