no connection available to resume, backing off for 1s

lixuancn commented 6 years ago

I only speak a little English. What do these logs mean? What should be done?

thinks

2018/09/10 19:11:17 WRN 100 [log_audit_behavior/one] no connection available to resume 2018/09/10 19:11:17 WRN 100 [log_audit_behavior/one] backing off for 1s 2018/09/10 19:11:17 WRN 90 [lab_asyncprocessapi/one] no connection available to resume 2018/09/10 19:11:17 WRN 90 [lab_asyncprocessapi/one] backing off for 1s 2018/09/10 19:11:17 WRN 91 [audit/one] no connection available to resume 2018/09/10 19:11:17 WRN 91 [audit/one] backing off for 1s 2018/09/10 19:11:18 WRN 100 [log_audit_behavior/one] no connection available to resume 2018/09/10 19:11:18 WRN 100 [log_audit_behavior/one] backing off for 1s 2018/09/10 19:11:18 WRN 90 [lab_asyncprocessapi/one] no connection available to resume 2018/09/10 19:11:18 WRN 90 [lab_asyncprocessapi/one] backing off for 1s

ploxiln commented 6 years ago

Your consumer has no connections to nsqd. You need to call one or more of the .ConnectTo*() methods and they need to succeed (maybe the connections failed for some other reason).

lixuancn commented 6 years ago

connect to nsqd and success! Most of them can consume success, while a small part will report warning log.

lixuancn commented 6 years ago

qps: 5000 success: 99.6% fail: 0.4% (warning log) (no connection available to resume, backing off for 1s)

Success and fail at the same time. So, I don't think consumer has no connections to nsqd.

ploxiln commented 6 years ago

Then we'll need much more information to determine what is happening in your case. Do you have many independent consumers in the same process? In multiple processes? Are you hitting a connection limit imposed by your operating system? Is this a temporary issue because connections are taking a few seconds to be established?

Why do you think this warning message represents failure? It could be an erroneous warning. It sounds like many messages are being processed, quickly, and it does not sound like any messages are lost. A possible bug could be causing some delays and reduced throughput, but is it really a bug? Or are you hitting some limitation of your setup (open file descriptor limit, natural connection time)?

This is the kind of situation where there's nothing we can do but pose an endless stream of questions, which we cannot answer ourselves. You will have to investigate to determine if there is a real problem or not, what is really happening, and what sequence of events and set of conditions causes it. Or, you can just ignore it :)

nsqio / go-nsq

no connection available to resume, backing off for 1s #234