nsqio / go-nsq

The official Go package for NSQ
MIT License
2.59k stars 444 forks source link

frequent broken pipes when reading from/writing to queues #207

Closed djmally closed 7 years ago

djmally commented 7 years ago

Hi folks,

My team develops a distributed web fuzzing tool in Go, & we use NSQ as our message queueing system, using the go-nsq client. We've been seeing very high rates of messages like these in our logs recently, and wondered if you might have more insight, as it's very unclear to us what's causing this to happen:

error sending RDY 0 - write tcp 10.82.11.114:7886->10.82.11.71:4153: write: broken pipe
IO error - write tcp 10.82.11.114:7886->10.82.11.71:4153: write: broken pipe

We almost always see a message like draining... waiting for 1 messages in flight surrounding these messages.

We're running 8 nsqd nodes and 1 nsqlookupd instance, and running NSQ version 0.3.8. We've set the following nsqd configurations:

exec ./bin/nsqd \
      -msg-timeout="600s" \
      -max-heartbeat-interval=10m0s \
      -max-msg-size=15728640  \
      -max-output-buffer-size=15728640 \
      -max-rdy-count=20000 \
      -mem-queue-size=10 \
      -sync-every=20000 \
      -tls-required=true \
end script

It seems like this also might be related to https://github.com/nsqio/go-nsq/issues/199

ploxiln commented 7 years ago

TCP connections are being unexpectedly closed. It's possible that you have a firewall or NAT killing your connections after a few minutes of inactivity. Try setting your heartbeat interval back down to the default 30 seconds.

mreiferson commented 7 years ago

Can you provide any more logs or reproducible setup?

mreiferson commented 7 years ago

This indeed might be related to #199, see https://github.com/nsqio/go-nsq/issues/199#issuecomment-294375520

djmally commented 7 years ago

Closing this & moving discussion to #199