hsaturn / TinyMqtt

ESP 8266 / 32 / WROOM Small footprint Mqtt Broker and Client
GNU General Public License v3.0
198 stars 41 forks source link

Remove disconnect memory leak #74

Open richievos opened 1 year ago

richievos commented 1 year ago

When disconnecting, the client needs to be disabled and assuming it is a client created by the broker, it needs to be deleted. Previously the client was directly removed from the broker, thereby losing any reference to it and it could never be deleted.

This changes that behavior to purely invalidate the client, and the next time through the broker loop it will be properly cleaned up. This also fixes the loop code to properly erase the client from the list, and not break out early the first time it finds a disconnected client.

This doesn't fix everything, in that it seems like there's another bug that it is deleting the client without a flag check. However I'm leaving that for a separate fix.

I have tested this on my ESP32 by uploading this version of the code, running a client in an infinite loop which connects, publishes, disconnects. With the old code I can steadily see memory usage linearly leaking over time. With the new version it stays stable after the first couple times.

More in the linked issue.

Fixes #73

richievos commented 1 year ago

Eh, I spoke too soon. This was working great when I first tried it, but now I'm running into weirdness. Will continue testing................

richievos commented 1 year ago

Pushed a tweak which is making this stable for me. Both stable in terms of no crashes, and also memory is a lot more stable. I can't say it's flat, but it's better than it was.

hsaturn commented 1 year ago

Hello

The client deletion is a tricky part, as you saw. So I cannot accept a PR without a unit test, because the feature could be broken at any time. However, thanks for it, it will probably help me to solve the problem forever. And congratz for having taken your time for this PR. I don't know at all if I'm going to use your way or another one. I wanted to use shared or unique ptr. I'll see.

Best regards

hsaturn commented 1 year ago

Richie, are you using MqttBroker::connect ?

Also, in MqttBroker::loop, I've obviously forgot to erase the client. Which means both problem that can occur

Thanks

richievos commented 1 year ago

I fully support unit testing, and I fully understand blocking new feature adds without tests. Blocking major issues with the library on lack of unit tests seems extreme though.

My app creates the broker in setup, and then does the standard loop call in loop. I also do the same with a local client, but I don't think it affects things.

In the current main branch code, every client that connects remotely leaves a memory leak when it disconnects since it's never freed. This diff fixes that and also ensures that any other disconnected ones that should be deleted get deleted because it continues looping through the clients at it cleans up the first one. It's not clear to me why the previous code early exited from the loop, but I assume it was a workaround to handle the corrupted memory (dead pointers) the current solution seemed to leave in place.

Regarding access to a freed client in the next loop generally, I'm not clear what other cases that could occur in, but that shouldn't happen anymore for those that are created by the network connection. My guess though is that current version had additional issues because it wasn't verifying that the client should be deleted. Particularly it had inconsistent behavior between clearing the list of clients (which has checks to verify it should be deleted).

hsaturn commented 1 year ago

Don't worry Richie, I will fix this issue. (you can read: I cannot leave this issue alive !)

And Good news : I have added instances count, and thus, have a unit test that fails !

The reason for which I cannot merge without test is that in fact, this problem is really very touchy !

You may know that a local broker can connect to a remote (TinyMqtt) broker that connect to ... etc... This behaviour allows redundant setup which I want to have for my home. (My wife will never accept that a light cannot turn on just because .... "sorry the broker is down, give me one hour to check .." :-D

A TinyMqtt broker have 3 kind of MqttClient

In fact, this is not about rejecting the PR, this is only that I do not want this problem to appear again in later versions of the lib. And, there is precisely a reason that may break things: I've not yet implemented zeroconf, and I know already that this feature is prone to break things on that point.

Thanks for answering, you are really helping ! Best regards

hsaturn commented 1 year ago

Note to avoid confusion, I'll push very soon a unit test of another memory leak, not related to this one.

hsaturn commented 1 year ago

@richievos :

In the current main branch code, every client that connects remotely leaves a memory
 leak when it disconnects since it's never freed. This diff fixes that and also ensures that any
 other disconnected ones that should be deleted get deleted because it continues looping
 through the clients at it cleans up the first one. It's not clear to me why the previous code 
early exited from the loop, but I assume it was a workaround to handle the 
corrupted memory (dead pointers) the current solution seemed to leave in place.

I'm getting upset with this issue !

I've written a test with an external client disconnecting its wifi. Thus this client is even not able to send Mqtt::Disconnect; and all is ok !! (long story is that there was a bug in EspMock / Wifi that was not disconnecting properly and this is why I thought that I had found the memory leak).

In the current main branch code, every client that connects remotely leaves a
 memory leak when it disconnects since it's never freed.

--> Cannot reproduce this !!!

I'm sure you have opened this issue with a good reason, but none of my tests leads to a memory leak unfortunately.

It's not clear to me why the previous code early exited from the loop,

Precisely because the client is deleted. And while deleting it, the ptr is removed from the clients vector. The break avoid the use of an invalidated iterator. I do agree that erasing it early as you do may avoid (or not in the future) this. Breaking is a good solution as it avoid any iterator invalidation and also because the loop is shorter (next will be processed in the next call to loop, which seems not to be a problem for me).

but I assume it was a workaround to handle the corrupted memory (dead pointers)
 the current solution seemed to leave in place.

There is no use of dead pointer as far as I know, before and after the PR.

hsaturn commented 1 year ago

I'm gonna try mosquito as you mentionned it on the issue:

$ while true
do
mosquitto_pub -h <my_esp> -p 1883 -t foobar -m '{}'

done
richievos commented 1 year ago

@hsaturn I pushed a minimal repro of this to https://github.com/richievos/tiny_mqtt_debug. It's built with platformio and you need to edit src/main.cpp to fill in your wifi creds. Note I have to build against my patch from #72 because otherwise you can't connect at all.

Reproduces 100% of the time. Shows a memory leak occurring every single connect/disconnect. The leak disappears after this PR is applied.

There's no particular reason I'm using mosquitto_pub, it's just a convenient CLI I've found. The others I tried using don't support this version of the mqtt protocol.

Repro steps

  1. clone the repo
  2. update src/main.cpp with your wifi creds
  3. upload it to a device, I'm using an ESP32
  4. open a serial monitor and observe output such as:
    Connecting to WiFi ...Connected ip=<...>
    Starting MQTT broker... done
    1225 free_heap=195876
    2000 free_heap=195876
    3000 free_heap=195884
    4000 free_heap=195884
    5000 free_heap=195884
    6000 free_heap=195884
    7000 free_heap=195884
    8000 free_heap=195884
    9000 free_heap=195884
  5. note the stable in memory usage
  6. start sending publishes. I am using the following (zsh), first exporting export MQTT_ESP=<IP_HERE>
    $ for i in {1..100}
    do
    echo $i
    mosquitto_pub -h $MQTT_ESP -p 1883 -t foobar -m ''
    sleep 1
    done
  7. note that you now see memory constantly decreasing
    44000 free_heap=195280
    45000 free_heap=195312
    46000 free_heap=195128
    47000 free_heap=194928
    48000 free_heap=194744
    49000 free_heap=194560
    50000 free_heap=194148
    51000 free_heap=194376
    52000 free_heap=194192
    53000 free_heap=194008

    If you let this run long enough it'll eventually stop being able to connect. Alternatively just remove the sleep and watch it explode faster.

  8. Switch to updated code by updating platformio.ini switching to TinyMqtt from https://github.com/richievos/TinyMqtt.git#main
  9. Run again
  10. note you see stable memory and it never loses the ability to connect
hsaturn commented 1 year ago

Thanks a lot.... At a first glance I'll trust your scenario and will expect that it is true (because you are using a patched version). Then I'll try to use the main branch and fix the connect problem you have mentioned. Once connected I'll try to reproduce your steps.

This way, I may fix both problems

I need a second life, or a way to work the night and the day :-D

Thanks a lot.

StefanR71 commented 8 months ago

Hi,

this changes are not in the current version, correct?

Because when i query every 5 sec with:

mosquitto_sub -h mqtt_broker -v -t "topic" -C 1

I can see that every connect the memory is going down a little bit.

As a workaround i used now a bridge and then i query the bridge, so i have only a single stable connection to the mqtt broker and the memory is stable.

Kind regards Stefan