Open richievos opened 1 year ago
Eh, I spoke too soon. This was working great when I first tried it, but now I'm running into weirdness. Will continue testing................
Pushed a tweak which is making this stable for me. Both stable in terms of no crashes, and also memory is a lot more stable. I can't say it's flat, but it's better than it was.
Hello
The client deletion is a tricky part, as you saw. So I cannot accept a PR without a unit test, because the feature could be broken at any time. However, thanks for it, it will probably help me to solve the problem forever. And congratz for having taken your time for this PR. I don't know at all if I'm going to use your way or another one. I wanted to use shared or unique ptr. I'll see.
Best regards
Richie, are you using MqttBroker::connect ?
Also, in MqttBroker::loop, I've obviously forgot to erase the client. Which means both problem that can occur
Thanks
I fully support unit testing, and I fully understand blocking new feature adds without tests. Blocking major issues with the library on lack of unit tests seems extreme though.
My app creates the broker in setup, and then does the standard loop call in loop. I also do the same with a local client, but I don't think it affects things.
In the current main branch code, every client that connects remotely leaves a memory leak when it disconnects since it's never freed. This diff fixes that and also ensures that any other disconnected ones that should be deleted get deleted because it continues looping through the clients at it cleans up the first one. It's not clear to me why the previous code early exited from the loop, but I assume it was a workaround to handle the corrupted memory (dead pointers) the current solution seemed to leave in place.
Regarding access to a freed client in the next loop generally, I'm not clear what other cases that could occur in, but that shouldn't happen anymore for those that are created by the network connection. My guess though is that current version had additional issues because it wasn't verifying that the client should be deleted. Particularly it had inconsistent behavior between clearing the list of clients (which has checks to verify it should be deleted).
Don't worry Richie, I will fix this issue. (you can read: I cannot leave this issue alive !)
And Good news : I have added instances count, and thus, have a unit test that fails !
The reason for which I cannot merge without test is that in fact, this problem is really very touchy !
You may know that a local broker can connect to a remote (TinyMqtt) broker that connect to ... etc... This behaviour allows redundant setup which I want to have for my home. (My wife will never accept that a light cannot turn on just because .... "sorry the broker is down, give me one hour to check .." :-D
A TinyMqtt broker have 3 kind of MqttClient
In fact, this is not about rejecting the PR, this is only that I do not want this problem to appear again in later versions of the lib. And, there is precisely a reason that may break things: I've not yet implemented zeroconf, and I know already that this feature is prone to break things on that point.
Thanks for answering, you are really helping ! Best regards
Note to avoid confusion, I'll push very soon a unit test of another memory leak, not related to this one.
@richievos :
In the current main branch code, every client that connects remotely leaves a memory
leak when it disconnects since it's never freed. This diff fixes that and also ensures that any
other disconnected ones that should be deleted get deleted because it continues looping
through the clients at it cleans up the first one. It's not clear to me why the previous code
early exited from the loop, but I assume it was a workaround to handle the
corrupted memory (dead pointers) the current solution seemed to leave in place.
I'm getting upset with this issue !
I've written a test with an external client disconnecting its wifi. Thus this client is even not able to send Mqtt::Disconnect; and all is ok !! (long story is that there was a bug in EspMock / Wifi that was not disconnecting properly and this is why I thought that I had found the memory leak).
In the current main branch code, every client that connects remotely leaves a
memory leak when it disconnects since it's never freed.
--> Cannot reproduce this !!!
I'm sure you have opened this issue with a good reason, but none of my tests leads to a memory leak unfortunately.
It's not clear to me why the previous code early exited from the loop,
Precisely because the client is deleted. And while deleting it, the ptr is removed from the clients vector. The break avoid the use of an invalidated iterator. I do agree that erasing it early as you do may avoid (or not in the future) this. Breaking is a good solution as it avoid any iterator invalidation and also because the loop is shorter (next will be processed in the next call to loop, which seems not to be a problem for me).
but I assume it was a workaround to handle the corrupted memory (dead pointers)
the current solution seemed to leave in place.
There is no use of dead pointer as far as I know, before and after the PR.
I'm gonna try mosquito as you mentionned it on the issue:
$ while true
do
mosquitto_pub -h <my_esp> -p 1883 -t foobar -m '{}'
done
@hsaturn I pushed a minimal repro of this to https://github.com/richievos/tiny_mqtt_debug. It's built with platformio and you need to edit src/main.cpp to fill in your wifi creds. Note I have to build against my patch from #72 because otherwise you can't connect at all.
Reproduces 100% of the time. Shows a memory leak occurring every single connect/disconnect. The leak disappears after this PR is applied.
There's no particular reason I'm using mosquitto_pub
, it's just a convenient CLI I've found. The others I tried using don't support this version of the mqtt protocol.
Connecting to WiFi ...Connected ip=<...>
Starting MQTT broker... done
1225 free_heap=195876
2000 free_heap=195876
3000 free_heap=195884
4000 free_heap=195884
5000 free_heap=195884
6000 free_heap=195884
7000 free_heap=195884
8000 free_heap=195884
9000 free_heap=195884
export MQTT_ESP=<IP_HERE>
$ for i in {1..100}
do
echo $i
mosquitto_pub -h $MQTT_ESP -p 1883 -t foobar -m ''
sleep 1
done
44000 free_heap=195280
45000 free_heap=195312
46000 free_heap=195128
47000 free_heap=194928
48000 free_heap=194744
49000 free_heap=194560
50000 free_heap=194148
51000 free_heap=194376
52000 free_heap=194192
53000 free_heap=194008
If you let this run long enough it'll eventually stop being able to connect. Alternatively just remove the sleep and watch it explode faster.
Thanks a lot.... At a first glance I'll trust your scenario and will expect that it is true (because you are using a patched version). Then I'll try to use the main branch and fix the connect problem you have mentioned. Once connected I'll try to reproduce your steps.
This way, I may fix both problems
I need a second life, or a way to work the night and the day :-D
Thanks a lot.
Hi,
this changes are not in the current version, correct?
Because when i query every 5 sec with:
mosquitto_sub -h mqtt_broker -v -t "topic" -C 1
I can see that every connect the memory is going down a little bit.
As a workaround i used now a bridge and then i query the bridge, so i have only a single stable connection to the mqtt broker and the memory is stable.
Kind regards Stefan
When disconnecting, the client needs to be disabled and assuming it is a client created by the broker, it needs to be deleted. Previously the client was directly removed from the broker, thereby losing any reference to it and it could never be deleted.
This changes that behavior to purely invalidate the client, and the next time through the broker loop it will be properly cleaned up. This also fixes the loop code to properly erase the client from the list, and not break out early the first time it finds a disconnected client.
This doesn't fix everything, in that it seems like there's another bug that it is deleting the client without a flag check. However I'm leaving that for a separate fix.
I have tested this on my ESP32 by uploading this version of the code, running a client in an infinite loop which connects, publishes, disconnects. With the old code I can steadily see memory usage linearly leaking over time. With the new version it stays stable after the first couple times.
More in the linked issue.
Fixes #73