Closed fly2F closed 6 months ago
If you could, please make sure you are running the latest server version, which currently is 2.10.12.
If you could, please make sure you are running the latest server version, which currently is 2.10.12.
Thank you very much for your reply!
I have previously upgraded the NATS version to 2.10.12, and I have not reproduced the issue on version 2.10.12 and after restarting to version 2.10.7.
I have not found a way to trigger the issue.
This is the detail of the consumer I have problems with. The Consumer sequence
in this consumer is strange
❯ nats consumer info '$MQTT_msgs' 'HgSLhmro_UKKHlQPcxPpTI6R1YNqdx3'
Information for Consumer $MQTT_msgs > HgSLhmro_UKKHlQPcxPpTI6R1YNqdx3 created 2024-03-29T12:47:12+08:00
Configuration:
Durable Name: HgSLhmro_UKKHlQPcxPpTI6R1YNqdx3
Delivery Subject: $MQTT.sub.UKKHlQPcxPpTI6R1YNqdst
Filter Subject: $MQTT.msgs.localserver.shopId.78880634
Deliver Policy: New
Ack Policy: Explicit
Ack Wait: 30.00s
Replay Policy: Instant
Max Ack Pending: 1,024
Flow Control: false
State:
Last Delivered Message: Consumer sequence: 541,868 Stream sequence: 563 Last delivery: 9.04s ago
Acknowledgment Floor: Consumer sequence: 0 Stream sequence: 0
Outstanding Acks: 42 out of maximum 1,024
Redelivered Messages: 42
Unprocessed Messages: 0
Active Interest: Active
(base) ~ on ☁️ (us-west-2)
Here is the relevant connz information. out_bytes
and out_msg
are increasing.
I noticed this issue https://github.com/nats-io/nats-server/issues/4736. Although I am using mqtt, my consumer behavior is very similar to the description in this comment https://github.com/nats-io/nats-server/issues/4736#issuecomment-1955690764 . It is possible that the two issues are related.
@fly2F I am going to pursue this from the MQTT angle, let @derekcollison comment on whether there may be an underlying JetStream issue. To make sure I understand this correctly, sounds like your client was first receiving (all) MQTT messages, until it stopped? (and then one of your workarounds would make the messages flow again?)
Are the messages you mention QOS1 or QOS2 MQTT messages?
@fly2F I am going to pursue this from the MQTT angle, let @derekcollison comment on whether there may be an underlying JetStream issue. To make sure I understand this correctly, sounds like your client was first receiving (all) MQTT messages, until it stopped? (and then one of your workarounds would make the messages flow again?)
Are the messages you mention QOS1 or QOS2 MQTT messages?
Thank you for your reply.
I am using MQTT QOS 1 and CleanSession is false.
Some of our MQTT clients reported the issue almost on the same day.
They can receive messages in the morning, but not in the afternoon.
We turned on the Trace level log and observed that the server did not send PUBLISH to the MQTT client, but the out_msgs in connz increased.
@fly2F I am going to pursue this from the MQTT angle, let @derekcollison comment on whether there may be an underlying JetStream issue. To make sure I understand this correctly, sounds like your client was first receiving (all) MQTT messages, until it stopped? (and then one of your workarounds would make the messages flow again?)
Are the messages you mention QOS1 or QOS2 MQTT messages?
I have also observed that the Consumer sequence in the Last Delivered Message of the consumer information is increasing by 16 every 30 seconds (Ack Wait).
These observations are made while the client is online.
@fly2F Thank you, I am trying to reproduce the issue now. Any more MQTT-related context/estimates you can provide? How many clients are there, do they re-connect often/at all prior to the issue, traffic # of messages and size? Anything "interesting" about the subscriptions? Wildcards used?
@levb Thank you.
I have two single-node NATS servers that experienced problems with MQTT clients not receiving messages within 2 days. These two servers are used for testing. One has about 50 clients and the other has about 30. They do not use wildcards and have very little traffic. The number of messages sent is roughly between 200 and 500. I also have three 3-node NATS clusters with more clients than these two, and they have never had this problem.
I restarted one server and upgraded the other to 2.10.12. It ran for 3-5 days and the problem did not recur. I did not find a way to reproduce it.
Since we use fixed Client IDs, there are often situations where Client IDs are repeatedly kicked offline on these two test servers. However, I specifically tested the situation of using two identical client IDs to preempt each other for two consecutive days, and it could not be reproduced. this problem.
The message size is small, about 100 bytes
Since we use fixed Client IDs, there are often situations where Client IDs are repeatedly kicked offline on these two test servers. However, I specifically tested the situation of using two identical client IDs to preempt each other for two consecutive days, and it could not be reproduced. this problem.
Thanks! We did fix a number of "mechanical" (reliability) issues is 2.10.12 relative to the the early 2.10 releases, so please monitor on your end and I will add a long-running test to reflect your use-case.
@levb Thanks!
The strange thing is that when the mqtt client connection cannot receive messages, out_msgs and out_bytes in connz are increasing, but the Trace level log is not included in the PUBLISH log of the client.
Can we record some mqtt metrics during PUBLISH (such as mqttEnqueuePublishMsgTo
) to compare it with out_msgs metrics ?
I have upgraded the version and I cannot reproduce the problem.
@levb @derekcollison Thanks!
Observed behavior
I am having trouble receiving messages with my MQTT client.
Check the durable consumer. The Consumer sequence in Delivered Message is increasing, but ack is always 0.
Workarounds:
$MQTT_sess
on the server and then restart the client. This will also rebuild the session and the client will start receiving messages again.Expected behavior
The client should receive the message.
Server and client version
Server: Nats 2.10.7, 1 Node Client: Java
org.eclipse.paho.client.mqttv3
Host environment
Host: Amazon Linux 2, 5.10.184-175.749.amzn2.x86_64 Docker: 20.10.25
Steps to reproduce
connz
. The connection status is normal.connz
sub list. It contains a topic and a delivery subject starting with$MQTT.sub
.connz
delivery subject. It is consistent with the records in$MQTT.sess
and the configuration in the corresponding durable consumer.out_bytes
andout_msg
with connz. They are increasing.