philbowles / PangolinMQTT

PangolinMQTT - ArduinoIDE client library for ESP8266, ESP32 and STM32-NUCLEO
Other
70 stars 22 forks source link

Conserving RAM #20

Closed leifclaesson closed 2 years ago

leifclaesson commented 4 years ago

I've been having some problems with one of my esp8266 projects randomly rebooting, particularly when there's lots of activity. With a pi zero capturing debug serial output, I was able to trace the problem to a possible WiFi congestion situation when a lot of updates are happening. Each publish operation eats ~250 bytes of heap memory, with the PublishPacket class alone using 122 bytes if i remember correctly. Until it gets through the TCP pipe, it stays in the heap.

I'm running at at ~8k free (and that's after hours and hours of ram optimization of my own code already) so updating three values thrice a second would quickly eat all the heap memory if packets weren't going out -- especially if there was a connection issue, since connected() returned true also when it was attempting a connection.

Those progressive status updates (how open my gate is, while it's opening) aren't actually that important, certainly not worth crashing over. So, now that I know the problem, as a workaround, I've made it so that the update rate slows down if available heap goes below a certain threshold, and I eschew updating certain parameters completely if available heap goes below a lower threshold.

So far so good, but here are the questions. Would it be possible to:

  1. Get a handle to a publish so I can check if it has gone out or not when the next update comes along? Perhaps I don't want to send another update then. Or, better yet:
  2. Could a publish operation have an "volatile" flag, where if there's an earlier update to the same topic in the output pipeline, then simply overwrite the value in the previous update rather than allocating a new object?
  3. Reduce the size of the packet class? It has a lot of pointers, and they use a lot of space. Could it be optimized? Every bit helps. std::map also has overhead, not to mention lambda functions. On an ESP32 it's not a big issue but on an ESP8266 this really matters.
proddy commented 4 years ago

just out of interest does PangolinMQTT use a smalerl heap memory footprint than AsynMqttClient? I know the code is smaller and much more optimized but interested in the heap before I make the switch. I'm always struggling with available heap and defragmentation on an ESP8266 as all my projects run a telnet server, web server and lots of other fancy server stuff. Still 8kb free seems pretty low in your case. I've also noticed on my tests that lambda functions are faster and nicer on memory than using std::bind for example. I haven't looked deeply into the Pangolin code but it may make sense to put each publish operation on the stack even if its only 250 bytes.

leifclaesson commented 4 years ago

I didn't compare the memory usage directly, but my gut feeling is that PangolinMQTT may use more heap than AsyncMQTT but considering that AsyncMQTT deallocates things that are still in use, it's really not a fair comparison. PangolinMQTT is a much better idea regardless... since it works and all.

Yes, 8K is not a lot. Web server, telnet server, but most importantly a full local SH1106 hierarchical menu system.

Since writing this post yesterday, I just spent three hours RAM-optimizing my own LeifHomieLib this morning, with great results. sizeof(HomieProperty) was 120 bytes when I started, not counting per-object heap allocations.

The first thing I did was to remove String strTopic and String strSetTopic from each HomieProperty, replacing them with GetTopic() and GetSetTopic() functions that created the topic string on the stack instead. sizeof() dropped to 88 bytes, and of course the storage for all that string data went away too.

Each empty String object takes 12 bytes. Pointers only take 4 bytes. So if a string is often empty, it's better to not allocate it at all and just leave a null pointer. With Get/Set accessor functions, this can be done transparently. std::vector also uses 12 bytes. If it's optional and often not used (read-only properties certainly do not need a callback lambda!) then this also saves 8 bytes.

So, with that I'm down to sizeof(HomieProperty)==72 bytes.

Then there's the enum. I have a handful of values.. but sizeof(enum) is 4 bytes!! Okay, so we change it to uint_8. Three bytes gone.

Then there's all those pesky little booleans.. There were 7 of them, in three blocks, so alignment rules also waste memory. I replaced them with a single uint8_t and get/set accessor functions that mask out individual bits. Uses more flash but saves RAM.

I also moved uint_8 flags; and the uint8_t enum; so they're together at the end of the structure, to prevent 32-bit memory alignment wastage.

In the end, I got sizeof(HomieProperty) down from 120 to 64 bytes with no loss in functionality.. but I did have to search and replace a bunch in my codebase to change direct variable access to accessor functions. That's included in the three hours, though!

Before the modifications, after all my homie-related initialization I was down to 13904 bytes heap. After, 15992 bytes. That's more than two kilobytes gained. That's huge -- I'm now idling at over 10 kilobytes instead of 8. NO loss in functionality. Perhaps a miniscule loss of speed, but speed isn't a problem in the ESP8266 -- RAM is.

I am certain that if Phil looks through Pangolin with a fine-toothed comb, he will find plenty of things like this that can be done to save memory. It's certainly not a hog as it is, but it can certainly get better.

Pablo2048 commented 4 years ago

The main problem with Pangolin is not just the memory consumption, but the heap fragmentation. On my system with AsyncMQTT (corrected to use the COPY flag) I have heap fragmentation up to 11%, with PangolinMQTT i've seen heap fragmentation about 30%. This is probably because of using C++ standards like std::vector which does memory reallocation so the heap fragmentation raises. This raise, combined with badly use of ESP.getFreeHeap() instead of using ESP.getMaxFreeBlockSize() make getMaxPayloadSize() unuseable after some time. All this (and some others) are the reasons which stops me using PangolinMQTT and vent back to the AsyncMQTT.

proddy commented 4 years ago

@leifclaesson thanks for explaining how you tackled the heap consumption. I've too spent many hours profiling my code to squeeze out every bit of available memory. BTW I stumbled across an optimized version of AsyncWebServer which also helped tremendously.

@Pablo2048 yes the fragmentation is what we should really be worrying about. Containers like std::vector and Ardunio's String are notorious for creating memory holes on ESP8266s. In my experience any frag > 40% gets problematic. I was able to reduce this in my code by only using std::strings and allocating initial space for std::vectors using the .reserve() function. Also .resize() with strings when they go out of scope.

leifclaesson commented 4 years ago

stops me using PangolinMQTT and vent back to the AsyncMQTT

Oooh snap, Phil's not gonna like that :-). Jokes aside, I'm sure it will improve. It's an early alpha. The structure is there, it just needs some refinement. There's no way I could go back to AsyncMQTT at this point. I do hope Phil will come back soon and spend some time on PangolinMQTT again.

Pablo2048 commented 4 years ago

:D yes, some issues can be refined, but some not I'm afraid. For example PangolinMQTT can be used just as an singleton which is another show stopper for me. And it seems like the library has stalled in its alpha state...

leifclaesson commented 4 years ago

And it seems like the library has stalled in its alpha state...

I really hope that's not the case and that Phil is just on vacation. But, just in case, do you have a link to your fixed AsyncMQTT with the copy flag fix, whatever that means? Did that solve the use-after-free problem?

Pablo2048 commented 4 years ago

Well, because I mangled it from many sources I don't have repository for it so I've attached it for you. Basically it's async-mqtt-client with applied pull-requests 156, 157, 159, 180 and then modified to use TCP_FLAG_WRITE_COPY (and some fine modifications like 'MQTT' instead of 'M', 'Q',... thing). I also suggest to use ESPAsyncTCP library with correct cansend version from here https://github.com/glmnet/ESPAsyncTCP/tree/patch-cansend For me it solve almost all problems. async-mqtt-client.zip

leifclaesson commented 4 years ago

Hey, does anyone know what the Heap Fragmentation figure actually means? From looking at the numbers for a while and trying to make sense of them, I seem to be getting within the ballpark just by dividing ESP.getFreeHeap() by ESP.getMaxFreeBlockSize and subtracting one.. and if that's the case, 30% heap fragmentation, if it means 8000 bytes free, 6100 bytes max alloc, isn't nearly as scary as it sounds. Thinking back to the days before SSDs, 30% fragmentation on a hard drive would have been insane.. but if it just takes one tiny block out of place (in the middle of the otherwise allocable area) to get to a horrible figure like this, that really doesn't mean that much.

Although my system usually idles around 30% fragmentation, I've sometimes seen it suddenly dive down to under 10% all on its own.

Pablo2048 commented 4 years ago

I take a look at my currently used core and see, that the implementation now relies on umm_fragmentation_metric and count with blocks, not physical RAM bytes so I'm unable (at least for now) to tell more. As I said - my application at runtime has somewhere between 3 - 11% heap fragmentation with AsyncMQTT client even when I did some ArduinoJSON6 deserialization, but it's probably because I try to play nice with my heap and try to made all memory-dynamic operation only when the system starts up so in runtime there is only network buffers dynamic allocation (I hope...). Edit: here https://cpp4arduino.com/2018/11/06/what-is-heap-fragmentation.html I found some info about heap fragmentation metric just for record...

leifclaesson commented 4 years ago

Thanks, Pablo! From that article, it does look like a simple way of calculating it is pretty much as I guessed above. Even if there's a more advanced way of calculating, the simple method seems to get close. In any case, as long as it's stable over time, I might be okay for now. For example, just now I saw 8.8 kilobytes free total, 4.9 kilobytes max block, 33% heap fragmentation. But, the system has been up and running for 32 hours now (something that did not happen before my recent optimizations), so I won't make any large allocations at this point.

philbowles commented 4 years ago

Yes, I have been "on vacation" (of sorts). Anyway, I will look into these issue in more detail when I get some more time - I have not abandoned the project, I just have a lot of other more important things to do at the moment.

As with any open source project, people are free to fork and change things they don't like. Reverting to a library with as many problems as AsyncMQTT which fundemantally doesn't work, even with "copy on send" set is a pretty retrograde step IMHO, but "horses for courses"

8k free heap is pretty disastrous - I'd love to know what else your code is doing! I get twitchy with any of my code if it drop below about 16... I do have plans to ptimise one of two minor things internally on the mempry manage,ent front, but I doubt of they will make a significant improvement to the inevitable fragmentation problem :(

philbowles commented 4 years ago

PangolinMQTT can be used just as an singleton which is another show stopper for me.

@Pablo2048 Can you explain why the singleton is a problem? Do you have multiple threads all sending asynchronously? Also can you provide more detail on:

I take your point re max free block calculation and will look into changing that.

philbowles commented 4 years ago

@leifclaesson Initial thoughts:

Get a handle to a publish so I can check if it has gone out or not when the next update comes along? Perhaps I don't want to send another update then.

Typical round-trip publish times are: 6-8ms @ Qos0, So I can't see the utility of a "handle" to something with such a short lifetime. If your app can safely omit / throttle its onw messages then the obvious - to me - answer is simply to slow down your sending rate

Could a publish operation have an "volatile" flag, where if there's an earlier update to the same topic in the output pipeline, then simply overwrite the value in the previous update rather than allocating a new object?

I guess that's possible, but i'm loath to add / complicate exsiting functionality for what seems at first to be a rare edge case that can be handled by the user simply slowing the send rate.

Reduce the size of the packet class? It has a lot of pointers, and they use a lot of space. Could it be optimized? Every bit helps. std::map also has overhead, not to mention lambda functions. On an ESP32 it's not a big issue but on an ESP8266 this really matters.

I spent a lot of time trying to optimise everything within the constraints of the basic design decision of using C++ stdlib. Of course there could be more and I will look into that next time I'm hacking the code. I feel that shaving a few extra bytes (if possible) would merely be a "sticking plaster" solution - at best its only going to save a handful. For me the real issue is running any kind of code with only 8kb free heap - I'm happy to look at your code and offer suggestions that might free up more heap.

Does std::map have heap overheads? If the logic dictates that a map is required to provide the required functionalty then there are two options: 1 ) std::map 2) hand-rolled C implementation which is going to take weeks, be bug-prone, less efficient and probably slower. I challenge any programmer to improve on the efficieny of std::map. The question really becomes - coud the internal strategy be chnaged so that a map is not required? The answer is "no" while the MQTT protocol is as it is.

The same argument applies to lambda functions - replacing them with a hand-rolled function pointer scheme or totatlly restructuring the code to use a different strategy is a non-starter, I'm afraid.

philbowles commented 4 years ago

One final thought: It's always good to read the documentation fully, for example, this bit:

Obviously I will check the issues here from time to time, but if you want a rapid resonse, I can usually be found moderating on of these FB groups, including a new one especially set up for Pangolin users:

Pangolin Support ESP8266 & ESP32 Microcontrollers ESP Developers H4/Plugins support ...

Pablo2048 commented 4 years ago

@Pablo2048 Can you explain why the singleton is a problem? Do you have multiple threads all sending asynchronously?

Of course - I do have many ESP-base devices in field (actually more than 200), installed by my customers, sending data via MQTT (so far AsyncMQTTClient) to their MQTT brokers. On every device there is possibility to turn on MQTT VPN - VPN network to my service MQTT broker. If customer want/need my assistance, then he enable MQTTVPN block, which create new virtual LWIP interface inside ESP8266, connect to my broker and allow me to access the device even behind several levels of NAT. Almost everything is working - ping, web, websocket, even OTA (with some quirks). Actually my customers likes it very much, because I can modify running scripts to suit their (changing) needs very quickly... So I do need two instances of MQTT client - one for their main data broker and one for the service/on demand broker.

philbowles commented 4 years ago

@Pablo2048 Thanks for explaining that - I will consider allowing multiple instances of the broker. I woll therefore mark this thread as an enhancement and leave it open.

leifclaesson commented 4 years ago

Hi Phil!

Typical round-trip publish times are: 6-8ms @ Qos0, So I can't see the utility of a "handle" to something with such a short lifetime. If your app can safely omit / throttle its onw messages then the obvious - to me - answer is simply to slow down your sending rate

Indeed, the typical ones are not the problem, it's the atypical ones I'm worried about. Slowing down the rate is exactly what I want to do -- when MQTT isn't keeping up. It's certainly safe to use any update rate (MQTT does not affect the actual operation of the gate, only status is reported), it's even safe to omit them completely -- it just won't be very responsive in the user interface. 2 Hz is the preferred rate (decent compromise between overhead and responsiveness), which can still quickly eat a lot of memory in case the updates aren't getting through, so that's when I'd want to hold off, as long as I know when to.

8k free heap is pretty disastrous - I'd love to know what else your code is doing! I'm happy to look at your code and offer suggestions that might free up more heap.

That would be great! I appreciate the offer.

The primary purpose is a master/slave gate controller -- controlling two sliding gates (the ones in my profile picture) and making sure they stay in sync. The master/slave runs on separate custom PCBs with an esp8266 on each board with RS-485 communication between them. It counts the pulses from the gate motors to know the status and position, and also adds features like narrow open (for motorbikes / walking) and auto-close (by monitoring the IR beam and counting down).

It's doing quite a few other things too, though. Here's a few:

It's a long term project so no real rush, it's running and in service now, current uptime is 6 days 10 hours, but it would be nice to free up a few more kilobytes of ram as well as stabilize memory use. Right now the only dynamic, unpredictable memory user is MQTT.

Running the project fully requires quite a bit of hardware (I use a breadboard with an ESP32 as the gate simulator when developing here), but getting it to where you can at least navigate menus needs only an ESP8266, an SSD1306/SH1106 display, a rotary encoder with push button, and an MCP23017 I2C I/O Expander, which you may already have laying around since you're into MCUs.

I build with Sloeber (eclipse) with the Arduino core. I can absolutely package up the source code in such a way that it'll build elsewhere, it's all very modular.

I don't use facebook but I'm on skype, whatsapp, signal, and obviously e-mail. What works for you?