AsyncWebSocketControl packets were being sent more than once by AsyncWebSocketClient::_runQueue().
This blocks sending queued packets and eventually trigger ERROR: Too many messages queued serial message.
This PR adds a guard to avoid re-sending, similar to what AsyncWebSocketBasicMessage and AsyncWebSocketMultiMessage counterparts already do.
Cause
Couldn't track down or reason why it was happening, but I traced it via serial empirically.
The side effect when the same control packet is sent twice is described next.
Scenario
Let's say you send the same 2-byte control packet twice (by error). And also a 10-byte message once.
To clarify, there is only ONE control packet in the control queue, and only one message in the message queue.
Control packets get priority and are sent first, thus also ack'ed first.
AsyncWebSocketClient::_onAck callback will remove the first control packet from the control queue.
Then either
on the same _onAck (4-byte ack) call, due to len leftover to ack the second repeated packet...
or on the following _onAck (2-byte ack) call as the control queue is now empty and goes directly to the message...
...make the 10-byte message incorrectly ack'ed partially for 2-bytes from the second (wrong) control packet.
Then a following _onAck call, with the 10-bytes length for the message, will overflow the message as ack'ed for 12 bytes!
This breaks _runQueue() as mesg.betweenFrames() forever is false (_ack == _acked turns into 10 == 12).
Consequences
AsyncWebSocketClient::_runQueue stops working.
Control queuing will never send controls as the there is a front message in the message queue which will be in perpetual NOT betweenFrames, inhibiting control send.
Message queuing knows it has already sent everything, and it's just waiting ack for the front message, blocking the rest from being sent.
The control queue can grow and grow, making the device run slower.
The message queue at some point starts issuing ERROR: Too many messages queued to the serial output.
Messages/controls are never sent again.
Follow up
I must also recommend adding an error flag or serial message if there is any len leftover on an _onAck as if it's ever caused by something else it will very likely lead to stalling like in this scenario.
Problem
AsyncWebSocketControl
packets were being sent more than once byAsyncWebSocketClient::_runQueue()
.This blocks sending queued packets and eventually trigger
ERROR: Too many messages queued
serial message.This PR adds a guard to avoid re-sending, similar to what
AsyncWebSocketBasicMessage
andAsyncWebSocketMultiMessage
counterparts already do.Cause
Couldn't track down or reason why it was happening, but I traced it via serial empirically. The side effect when the same control packet is sent twice is described next.
Scenario
Let's say you send the same 2-byte control packet twice (by error). And also a 10-byte message once. To clarify, there is only ONE control packet in the control queue, and only one message in the message queue. Control packets get priority and are sent first, thus also ack'ed first.
AsyncWebSocketClient::_onAck
callback will remove the first control packet from the control queue. Then either_onAck
(4-byte ack) call, due tolen
leftover to ack the second repeated packet..._onAck
(2-byte ack) call as the control queue is now empty and goes directly to the message......make the 10-byte message incorrectly ack'ed partially for 2-bytes from the second (wrong) control packet.
Then a following
_onAck
call, with the 10-bytes length for the message, will overflow the message as ack'ed for 12 bytes! This breaks_runQueue()
asmesg.betweenFrames()
forever is false (_ack == _acked
turns into10 == 12
).Consequences
AsyncWebSocketClient::_runQueue
stops working.betweenFrames
, inhibiting control send.The control queue can grow and grow, making the device run slower. The message queue at some point starts issuing
ERROR: Too many messages queued
to the serial output. Messages/controls are never sent again.Follow up
I must also recommend adding an error flag or serial message if there is any
len
leftover on an_onAck
as if it's ever caused by something else it will very likely lead to stalling like in this scenario.