uber-archive / node-statsd-client

Node.js client for statsd
ISC License
28 stars 9 forks source link

Not safe for running in a node cluster (like PM2) #36

Open blak3r2 opened 8 years ago

blak3r2 commented 8 years ago

Learned the hardway today that this will crash PM2.

There is a bug in node related to UDP and I believe this is the issue. https://github.com/nodejs/node-v0.x-archive/issues/9261

I do not know why this statsd client causes it but the node-statsd one does not. I wanted to post here before I forget in case it helps others.

Raynos commented 8 years ago

Interesting. We've not seen this edge case with node cluster & UDP itself.

Sorry about that, you should look into using a different statsd client or not using cluster.

At Uber, we do not use cluster.

Raynos commented 8 years ago

I updated the README with a caveat, thanks for pointing this out.

blak3r2 commented 8 years ago

Just to clarify... their is a way to use UDP... as npm node-statsd doesn't seem to cause this issue (at least anymore). I will dig in deeper when i get time and report back.

Thanks for putting this buffered statsd library together. I definitely had less packets dropped when i was running it (unclustered).

Raynos commented 8 years ago

@blak3r2 looks like the issue is that node-statsd creates a single UDP socket. Where as uber-statsd-client can create multiple UDP sockets, it will actually de-allocate the UDP socket on inactivity and re-allocate it lazily once needed. ( where inactivity, by default is 1000 milliseconds of no UDP writes )

@blak3r2 I suspect that if you use a module like process-reporter ( https://github.com/rf/process-reporter ) which continiously sends stats in an interval, you can "avoid" the issue.

You could also tweak the socket_timeout ( https://github.com/uber/node-statsd-client#optionssocket_timeout ), setting it to 10 minutes or 10 hours should avoid the multiple UDP sockets issue.

blak3r2 commented 8 years ago

Hi @raynos I just wanted to thank you for your reply!

I am not sure that is quite it because we would have been sending a flood of packets. I am going to setup some further tests this week and will report back.