phoddie / node-red-mcu

Node-RED for microcontrollers
120 stars 18 forks source link

How much should I be able to get into a Wemos D1 Mini ESP8266 #90

Closed colinl closed 1 year ago

colinl commented 1 year ago

With an ESP86 Wemos D1 Mini, I find that if I have an Inject, Trigger and GPIO Out node flashing the onboard LED,and an MQTT In followed by three function nodes each containing just return msg, followed by MQTT Out, that after connecting to the wifi, the target crashes. If I remove one of the function nodes then it runs ok.

Am I just hitting the limit of what I can do in this hardware? I am building it using -p nodemcu.

Here is the flow.

[{"id":"f92bb64b93bfc46e","type":"tab","label":"mcu test","disabled":false,"info":"","env":[],"_mcu":{"mcu":true}},{"id":"0a44cdba4c3f6a65","type":"debug","z":"f92bb64b93bfc46e","name":"debug 85","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","_mcu":{"mcu":true},"x":900,"y":100,"wires":[]},{"id":"90e9ccdae3d60741","type":"mqtt out","z":"f92bb64b93bfc46e","name":"","topic":"test/mcu/pid/result","qos":"","retain":"","respTopic":"","contentType":"","userProps":"","correl":"","expiry":"","broker":"75fb29423f8a0770","_mcu":{"mcu":true},"x":610,"y":140,"wires":[]},{"id":"8c687e519a7bd590","type":"mqtt in","z":"f92bb64b93bfc46e","name":"","topic":"test/mcu/pid/set/#","qos":"1","datatype":"json","broker":"75fb29423f8a0770","nl":false,"rap":true,"rh":0,"inputs":0,"_mcu":{"mcu":true},"x":100,"y":80,"wires":[["4ce47a86c7a14eab"]]},{"id":"4ce47a86c7a14eab","type":"function","z":"f92bb64b93bfc46e","name":"function 1","func":"\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"_mcu":{"mcu":true},"x":260,"y":80,"wires":[["f4e5d2f56c212e2b"]]},{"id":"ed0d63aad317e829","type":"inject","z":"f92bb64b93bfc46e","name":"","props":[{"p":"payload"}],"repeat":"2","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"1","payloadType":"num","_mcu":{"mcu":true},"x":190,"y":320,"wires":[["a83bf3122efb505c"]]},{"id":"a83bf3122efb505c","type":"trigger","z":"f92bb64b93bfc46e","name":"","op1":"0","op2":"1","op1type":"num","op2type":"num","duration":"500","extend":false,"overrideDelay":false,"units":"ms","reset":"","bytopic":"all","topic":"topic","outputs":1,"_mcu":{"mcu":true},"x":380,"y":320,"wires":[["32dcd5abbb576d04"]]},{"id":"32dcd5abbb576d04","type":"rpi-gpio out","z":"f92bb64b93bfc46e","name":"","pin":"2","set":"","level":"0","freq":"","out":"out","bcm":true,"_mcu":{"mcu":true},"x":550,"y":320,"wires":[]},{"id":"f4e5d2f56c212e2b","type":"function","z":"f92bb64b93bfc46e","name":"function 2","func":"\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"_mcu":{"mcu":true},"x":380,"y":160,"wires":[["1b6498b943de30db"]]},{"id":"1b6498b943de30db","type":"function","z":"f92bb64b93bfc46e","name":"function 3","func":"\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"_mcu":{"mcu":true},"x":400,"y":220,"wires":[["90e9ccdae3d60741"]]},{"id":"75fb29423f8a0770","type":"mqtt-broker","name":"Owl2 for mcu","broker":"192.168.49.83","port":"1883","clientid":"","autoConnect":true,"usetls":false,"protocolVersion":"4","keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","birthMsg":{},"closeTopic":"","closeQos":"0","closePayload":"","closeMsg":{},"willTopic":"","willQos":"0","willPayload":"","willMsg":{},"userProps":"","sessionExpiry":"","_mcu":{"mcu":false}}]
phoddie commented 1 year ago

It looks like this particular project should work on the ESP8266. I can reproduce your crash, so something is wrong. But running the same flow on an ESP32 with the same memory partition doesn't crash.

Unfortunately, it isn't entirely obvious where / why it is failing. The ESP8266 is notoriously difficult for native debugging (xsbug works great, of course!). I'll leave this open to investigate further when I have more time.

To address the question in the title, your mileage may vary. Different nodes consume very different amounts of memory. There's a small amount of memory used by each node for basic bookkeeping. Beyond that, it depends on the node implementation which is a function of both implementation style and functionality. The nodes implemented specifically for the MCU try to be reasonably memory efficient. The node implementations taken from Node-RED itself are generally fairly memory intensive. The trigger node is one of those -- I'd like to rewrite that. At least the common paths could use far less memory.

colinl commented 1 year ago

OK, thanks. I don't need to use four function nodes, I was just experimenting and was surprised when it failed with that flow. The flow in the plugin issue (where it turned out the problem was related to not preloading the flows) is more complex and includes two contrib nodes, and does run ok, though it doesn't have a trigger node. If I have time I will see if I can deduce anything further.

phoddie commented 1 year ago

I had a little time to come back to this. The crash is caused by a native stack overflow. The stack is just 4 KB, so there's not much margin. Still, that's usually plenty. My guess is that the garbage collector is overflowing because of some unusually long object chain. I want to confirm that though.

Increasing the native stack to 5 KB allows it to run reliably. (This behavior is consistent with this flow working fine on ESP32 with a similar memory partition, since ESP32 has a larger native stack by default.)

This flow uses the Trigger node. The MCU implementation of the Trigger Node is borrowed from Node-RED with a few minor adjustments. Alas, it is a very heavy implementation (with Promises even!). That's surely using more than a little memory and may be triggering(!) this problem. But, that's speculation at this point.

colinl commented 1 year ago

Thanks for looking at this. Do the function nodes consume stack if there are no messages going through them? I didn't think there were, but I need to go back and check. I haven't got the brain power to look at this at the same time as the other issue at the moment, I will come back to it when the other is sorted out. Fundamentally, though, the device will only be able to handle flows consisting of a handful of nodes, so one needs to be very careful about the flow design.

phoddie commented 1 year ago

@colinl – nothing for you to do here at the moment, I was just sharing an update.

Do the function nodes consume stack if there are no messages going through them?

No, they do not. But, they mostly just consume JavaScript stack space when running, not native stack space, which is what is overflowing here.

Fundamentally, though, the device will only be able to handle flows consisting of a handful of nodes, so one needs to be very careful about the flow design.

For sure. Part of that care is using nodes that have relatively light footprints. The Trigger node should really be rewritten to reduce its footprint -- the complexity of the implementation exceeds the functionality.

phoddie commented 1 year ago

This issue motivated me to enhance the runtime to detect native stack overflows. XS already has very well tested support for detecting stack overflow as a result of our (extensive) fuzz testing for vulnerabilities. That wasn't being used on ESP8266 or ESP32 though. With today's Moddable SDK update, it is. That doesn't fix anything but it allows the problem to be caught when it happens. Instead of a core dump from the device, xsbug stops and shows the JavaScript stack.

Using that information, I was able to make improvements to the runtime to reduce native stack use. That definitely helps. Those changes are generally good as they also reduce JavaScript stack depth and are likely faster in most cases. I also bumped up the native stack on ESP8266 by 512 bytes, to give a little bit more margin.

With all that in place, your particular case doesn't crash for me. You can always push things further to get to a failure, of course. Diagnosing that should be easier moving forward, at least.

To try it out you need to do three things:

FWIW – stack overflow detection is imprecise on these devices. The ESP8266 Arduino code has its approach, FreeRTOS has another, XS now does what it can. But, without an MMU there's always the chance of missing an overflow. That's life. ESP32 has the same challenge, but it has so much more memory that it can have a bigger default stack so native stack overflows there are exceedingly rare.

colinl commented 1 year ago

Excellent, that has made a big difference. I can run a flow with an Inject node, DS18B20, simple Function, MQTT out, Trigger, GPIO Out node and Debug node with no problems. In fact to that I can add five more trivial Function nodes and it still runs. If I add a sixth Function node then I get a Stack Overflow message, but I don't get a stack dump.

phoddie commented 1 year ago

Thanks for trying that out so quickly. Very glad to hear that the combination of improvements are working as hoped.

FWIW – there's plenty of opportunity to explore further optimization. Function nodes, for example, are heavier than they probably need to be in order to emulate Node-RED behaviors, like sandboxing. One step at a time though. I think we can close this particular issue out, yes?

colinl commented 1 year ago

Yes, thanks.