nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.67k stars 3.13k forks source link

Platform can run out of memory when sending network traffic #3231

Open pjsg opened 4 years ago

pjsg commented 4 years ago

Expected behavior

The platform does not run out of memory

Actual behavior

It does (sometimes) on the dev branch.

It appears that the GC mode we are currently using only runs the GC when there is little memory available. The network stack does not appear to use the LUA allocation primitives and so it can fail to allocate... In particular, I see 'E:M 1520'.

In my case, this is solved by putting an explicit collectgarbage call before each network send -- but this doesn't seem like a very satisfactory solution.

The options that I see:

Thoughts?

nwf commented 4 years ago

I know this isn't going to be a hugely popular response, but... our LwIP is ancient and a random checkout with Espressif patches lobbed into it and then our own (recall #3040). Rather than continuing to maintain and adapt this... pile of software, possibly the least painful long-term strategy is to assess how much memory we lose moving over to the Espressif 8266 RTOS SDK, which has a much newer LwIP in it, and adopt that as the new substrate, assuming it's tolerable.

TerryE commented 4 years ago

Tuning the EGC and stepping up is still TBD. All mallocs are mapped to pvPortMalloc etc. and it would be straightforward to map them to our own wrapper which kicks off EGC if necessary. There is a risk that this might cause timing bumps, but IMO this is a more robust response that just letting the stack fail. I did suggest this as an option in our EGC and memory discussions, but the then feedback was to wait for now. I can bump the priority of this.

As to LwIP, IIRC Johny and Arnold were quite keen to move from the then closed source Espressif version to an open source version, but Espressif have (i) switched to an Open Source version that they share with ESP8266 RTOS and expose on github, and (ii) have done a quite a lot of tweaks and fixes that we haven't mirrorer.

However, we have also done some tweaks that we might want to retain. One that I can think of was my moving the memory buffer for the DNS cache from static allocation which always took ~3Kb RAM even if you didn't do DNS resolution to JiT allocation which took no extra RAM if you didn't use DNS and typically < ¼Kb if you did one name resolution.

But doing some version rebase is probably worth considering.

pjsg commented 4 years ago

I think that this is a serious issue -- we need to determine a short term approach on this....

I'm inclined to force call collectgarbage on every send.....

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HHHartmann commented 2 years ago

Stale bot send to be a bit lazy. But i would see the quick fix at least