Open 2bam opened 7 months ago
After further investigation, it seems at some point AsyncWebSocketClient::_onAck
stops getting called. And as messages are never flushed control pings never get sent neither (finished=0) so they never get an ack neither.
This happens after a burst of messages arrive very fast/at the same time. After every message received, I send an "OK"
response. A ping and these "OK"
s might get intertwined.
If I keep receiving messages (and responding "OK"
) at one point the message queue fills up, start getting ERROR: Too many messages queued
message, but the message queue doesn't need to be full to create this stuck state.
I've added this debug code to look into the state of the queue each time I ping
void AsyncWebSocketClient::ping(uint8_t *data, size_t len)
{
ets_printf("ping\n");
int i = 0;
for (auto it = _controlQueue.begin(); it != _controlQueue.end(); ++it)
{
ets_printf("ping ctrl q %d) fin%d op%d\n", i++, (*it)->finished() ? 1 : 0, (*it)->opcode());
}
i = 0;
for (auto it = _messageQueue.begin(); it != _messageQueue.end(); ++it)
{
auto *x = (AsyncWebSocketBasicMessage *)(*it);
ets_printf("ping mesg q %d) fin%d bwf%d len%d sent%d ack%d acked%d\n", i++, (*it)->finished() ? 1 : 0, (*it)->betweenFrames(), x->_len, x->_sent, x->_ack, x->_acked);
}
if (_status == WS_CONNECTED)
_queueControl(new AsyncWebSocketControl(WS_PING, data, len));
}
The result:
ping ctrl q 0) fin0 op9
ping ctrl q 1) fin0 op9
ping ctrl q 2) fin0 op9
ping ctrl q 3) fin0 op9
ping ctrl q 4) fin0 op9
ping ctrl q 5) fin0 op9
ping ctrl q 6) fin0 op9
ping ctrl q 7) fin0 op9
ping ctrl q 8) fin0 op9
ping ctrl q 9) fin0 op9
ping ctrl q 10) fin0 op9
ping ctrl q 11) fin0 op9
ping ctrl q 12) fin0 op9
ping ctrl q 13) fin0 op9
ping ctrl q 14) fin0 op9
ping ctrl q 15) fin0 op9
[NOTE: this ctrl list keeps growing and growing with each ping! ]
ping mesg q 0) fin0 bwf0 len2 sent0 ack0 acked2
ping mesg q 1) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 2) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 3) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 4) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 5) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 6) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 7) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 8) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 9) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 10) fin0 bwf1 len2 sent0 ack0 acked0
ping mesg q 11) fin0 bwf1 len2 sent0 ack0 acked0
It seems the message at front of the queue is in an ill state, having acked for more than it was actually sent.
EDIT: Already found the problem. Added a PR with the full description and solution https://github.com/me-no-dev/ESPAsyncWebServer/pull/1390
I think it's an omission here
https://github.com/me-no-dev/ESPAsyncWebServer/blob/f71e3d427b5be9791a8a2c93cf8079792c3a9a26/src/AsyncWebSocket.cpp#L527
I believe it should do the same cleanup as it does for message packets, also for control packets.
Or maybe the following
if
could be awhile
to flush all of them out.https://github.com/me-no-dev/ESPAsyncWebServer/blob/f71e3d427b5be9791a8a2c93cf8079792c3a9a26/src/AsyncWebSocket.cpp#L498
My communication pattern is receiving human pressed keys which could be very fast and then very slow. Each key press gets a response sent. And the pings are more frequent and used to calculate round trip time. I start seeing problems when using pings/messages together and no problems at all if they are used separately (i.e. only pings, or only messages), regardless of the comm frequency. I'll keep investigating.