Closed richievos closed 1 year ago
I do agree, there is still a problem with this. But removing it will break connectivity for others (me at least). Unfortunately, I've fixed my problem without writing a unit test (too bad). I have to rewrite here some part of code.
Thanks for the PR, I won't accept it, but it does shows me that there is a problem that I have to fix.
Can you elaborate on what the situation is that adding a 5000 to that check makes things better? I'm trying to understand if there's a different case I'm going to run into trouble in, because the main branch code fails 100% of the time for me without this change.
Not worked on this issue yet.
I think there is a problem around client vs broker client timeout
1 - client connects to broker and timeout after ka 2 - broker receives the connection and create a new client (broker client) with also a timeout 3 - both may expire at the same time, then if the broker checks for timeout, it disconnects the client
This is a case that happening on my roomba project using TinyMqtt.
My unit test there are bad, because there are some strange #if
#ifdef EPOXY_DUINO
alive = millis()+500000;
#else
alive = millis()+5000; // TODO MAGIC client expires after 5s if no CONNECT msg
#endif
I have to review all this (after I've fixed the memory leak problem)
I think that the broker should increase the timeout when receiving a new connection to allow time for the client to send Alive message. Maybe this is as simple as this. I should be able to remove those crazy #if EPOXY that modify the behaviour on timeouts during unit tests !!!
I'm not a mqtt expert, but looking at https://m2msupport.net/m2msupport/mqtt-keep-alive-pingreq-pingresp/ I think there's a bigger bug here. I have a mqtt broker which I'm connecting to remotely. The ESP32 is running the broker.
Based on what I'm seeing, and stepping through the code, the MQTT Broker is sending a PingReq to the remote client. Is that intentional? I would expect the broker should only be sending responses.
The reason why the broker is sending the response is because it creates a client instance, giving it the tcp connection. That client then thinks it's talking to a remote broker, so it sends this pingreq.
void MqttBroker::loop()
{
#ifndef TINY_MQTT_ASYNC
WiFiClient client = server->accept();
if (client)
{
onClient(this, &client); // <--- creates the new client
}
...
void MqttBroker::onClient(void* broker_ptr, TcpClient* client)
{
debug("MqttBroker::onClient");
MqttBroker* broker = static_cast<MqttBroker*>(broker_ptr);
MqttClient* mqtt = new MqttClient(broker, client); <--- client is built
...
MqttClient::MqttClient(MqttBroker* local_broker, TcpClient* new_client)
: local_broker(local_broker)
{
...
tcp_client = new WiFiClient(*new_client); <-- tcp_client is set
#endif
// now back in broker::loop
for(size_t i=0; i<clients.size(); i++)
{
MqttClient* client = clients[i];
if (client->connected())
{
client->loop(); <--- that new client runs
}
void MqttClient::loop()
{
if (keep_alive && (millis() >= alive - 5000))
{
if (tcp_client && tcp_client->connected())
{
// *****************BUG
// None of this code should actually be running when it's a remote connection to this broker
debug("pingreq");
static MqttMessage pingreq(MqttMessage::Type::PingReq);
pingreq.sendTo(this);
clientAlive(0);
My removal of the 5000 coincidentally avoids that code running, because my client disconnects before that message would ever get sent.
Note, the fix isn't as simple as putting a if (!local_broker &&
at the beginning of the check. I believe that pingreq should always have a !local_broker
check on it, but if you add that check there then the next conditional fires:
else if (local_broker)
{
debug(red << "timeout client");
close();
That conditional then immediately kills the connection. That happens because alive
is:
alive = millis()+5000; // TODO MAGIC client expires after 5s if no CONNECT msg
which combined with the - 5000
means as soon as you try and connect, the client is immediately marked as dead, and it's immediately closed.
I'm actually not sure how anything can connect to this broker remotely currently, outside of the async path. Completely possible I'm missing something, but it feels like there's a significant bug there.
I haven't looked into a fix, but I imagine the fix involves:
I'm at work, no time to expand, but you're right. I'm not a Mqtt expert too, but a broker is not expected to send a PingReq. The point is that I wanted TinyMqtt to be as small as possible thus the client is both used for a broker and a client. This is a tricky design not easyt to maintain, but this is the cost for an embedded device and that leads to this kind of error.
Thanks again for the report. You are making TinyMqtt evolve a lot.
I'll review in detail the PR later. No time yet, sorry
I'm not sure what the 5000 is representing, but having it there breaks connectivity for me.
71