meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
2.98k stars 714 forks source link

[Bug]: MQTT with JSON crashes on nRF52 boards #2804

Open mdjurfeldt opened 9 months ago

mdjurfeldt commented 9 months ago

Category

Other

Hardware

Rak4631

Firmware Version

2.2.7.8b82ae6f

Description

I've got MQTT woŕking on a RAK4631 with ethernet (not wifi).

However, when I enable json, the device crashes and reboots soon after the first packet has been published through MQTT.

My interpretation of this is that the formatting of the MQTT json payload somehow trashes memory.

I'm not attaching any log since it simply is a random crash soon after the first packet. Also, this of course means that the device enters a reboot loop since there will be a packet sooner or later.

Relevant log output

No response

GUVWAF commented 9 months ago

This issue seems specific to the RAK4631 + Ethernet combination. A related issue was closed because it was assumed to be fixed: https://github.com/meshtastic/firmware/issues/2242, but it seems that is not the case.

mdjurfeldt commented 9 months ago

OK Just to be clear: This only happens when JSON is enabled in the MQTT app. If JSON is disabled, MQTT properly relays protobuf encoded messages.

Kollensperger commented 7 months ago

This issue seems specific to the RAK4631 + Ethernet combination. A related issue was closed because it was assumed to be fixed: #2242, but it seems that is not the case.

That’s correct - the issue appeared only with JSON on RAK4631 with ethernet, t-beam and wifi were not affected. I never got it to work even after the issue was closed

GUVWAF commented 6 months ago

I observed this also happens on nRF52 platforms when using the MQTT client proxy. To me it looks like a stack overflow due to recursive calls in the JSON library: https://github.com/meshtastic/firmware/blob/2e9cc73ebbc8daeefdd3d2342a79aef6a9051127/src/mqtt/JSONValue.cpp#L808

If I replace the recursive call to StringifyImpl(indentDepth1) with just "x", it doesn't crash. However, rewriting this part to an iterative approach looks to be quite some work. I tried increasing the stack size but unfortunately that didn't help.

PoincareMesh commented 6 months ago

Also using RAK4631 with ethernet on 2.2.17 , had a lot of issues (slow response, ...) connecting/configuring via BLE, usb and network but also these lock ups, until I disabled JSON in the MQTT module. Now it's working like a charm, but having JSON MQTT messages (again) would even be better.

arne182 commented 5 months ago

I can confirm RAK4631 with ethernet crashes with json and nRF52 t-echo also crashes with json and client proxy.

GUVWAF commented 5 months ago

The callstack I got from the debugger is as follows:

JSONValue::StringifyImpl[abi:cxx11](unsigned int) const@0x00051908 (/firmware/src/mqtt/JSONValue.cpp:808) std::char_traits::copy@0x00037386 (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/char_traits.h:292) std::cxx11::basic_string<char, std::char_traits, std::allocator >::_S_copy@0x00037386 (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/basic_string.h:324) std::cxx11::basic_string<char, std::char_traits, std::allocator >::_S_copy_chars@0x00037386 (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/basic_string.h:366) std::cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char*>@0x00037386 (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/basic_string.tcc:225) std::cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct_aux<char>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/basic_string.h:220) std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/basic_string.h:239) std::cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/basic_string.h:424) std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>::pair@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/stl_pair.h:292) __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> > >::construct<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*>, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> const&>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/ext/new_allocator.h:136) std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> > > >::construct<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*>, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> const&>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/alloc_traits.h:475) std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> > >::_M_construct_node<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> const&>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/stl_tree.h:626) std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> > >::_M_create_node<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> const&>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/stl_tree.h:643) std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> > >::_Alloc_node::operator()<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> const&>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/stl_tree.h:556) std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> > >::_M_clone_node<std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> > >::_Alloc_node>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/stl_tree.h:666) std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue> > >::_M_copy<std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue>, std::_Select1st<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, JSONValue*> > >::_Alloc_node>@0x00051b3a (/.platformio/packages/toolchain-gccarmnoneeabi@1.70201.0/arm-none-eabi/include/c++/7.2.1/bits/stl_tree.h:1830) ??@0x00000006 (Unknown Source:0)

It seems somewhere a copy deep in the toolchain goes wrong.

arne182 commented 5 months ago

std::string JSONValue::StringifyImpl(size_t const indentDepth) const { std::string ret_string; size_t const indentDepth1 = indentDepth ? indentDepth + 1 : 0; std::string const indentStr = Indent(indentDepth); std::string const indentStr1 = Indent(indentDepth1);

Would changing the std::string to char buffer[MAX_BUFFER_SIZE]; and strcat(buffer, someOtherString); This would avoid the Dynamic Memory Allocation that comes with std::string?

GUVWAF commented 5 months ago

@arne182 I just tried this to be sure, but as I expected this doesn’t help because the memory allocation is for typedef std::map<std::string, JSONValue *> JSONObject;. I believe all std::strings in the JSON library should then be replaced by statically allocated buffers.

arne182 commented 5 months ago

Here's an outline of the what must be done:

Replace std::string with Static Buffers: Replace the std::string type with statically allocated character arrays (char[]). This change will help to avoid the dynamic memory allocations that std::string performs.

Modify std::map Key Type: Since std::map keys in your case are std::string, you'll need to replace them with a suitable type that can handle statically allocated strings. This might involve creating a custom comparator for the map that can handle char[] keys.

Custom String Handling Functions: Implement or use existing functions for handling these static string buffers, such as for copying, comparing, and concatenating strings.

Adjust JSON Library Code: Refactor the JSON library code to use these new data types and handling functions. This might involve significant changes depending on how the library is structured.

Manage Buffer Sizes: Carefully manage the sizes of your static buffers. They should be large enough to handle the expected data but not so large as to waste limited memory resources.

Test Extensively: After making these changes, test the code extensively. Pay particular attention to buffer overflows, memory leaks, and proper handling of edge cases.

Consider Alternative Data Structures: If std::map proves problematic even after replacing std::string, consider using alternative data structures that might be more memory efficient and better suited for your specific use case.

GUVWAF commented 5 months ago

That sounds like an answer generated by an LLM. If you can let it write the required code as well, that’d be great because it would be a lot of work.

arne182 commented 5 months ago

That is where I got the idea for fixing the problem

al3ph commented 5 months ago

https://github.com/MaJerle/lwjson Something like this ? ...... Hmm bad example, as this doesn't support output, but the general idea is still valid. There are a bunch of libs like this out there.

https://arduinojson.org/ Might work.

arne182 commented 5 months ago

Exactly like that

Marx1 commented 4 months ago

I'm highly interested in this, as I'm working on a Meshstastic to AREDN Meshchat bridge and unencrypted JSON data is key for it to work.

As it stands I have dead hardware due to this bug.

GUVWAF commented 4 months ago

@Marx1 If you’re depending on JSON for nRF52 boards, I would recommend to decode the protobufs yourself instead of on the device, as this issue seems not easy to resolve. You can leave encryption for MQTT disabled and then convert the protobufs into JSON. See https://github.com/pdxlocations/Meshtastic-MQTT-Connect/blob/main/meshtastic-mqtt-connect.py#L265 for an example of how to decode the protobufs in Python.

geeksville commented 4 months ago

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/rak4631-ethernet-not-working/7010/17

ruswerner commented 4 months ago

I frustratingly ran into this issue and by chance found mention of it on the forums. It would be great to explicitly mention this on the RAK4631 hardware page in the docs and also on the MQTT page in the docs. It would be great for JSON to "just work" on the nRF processor, but I actually think it's better that it doesn't in order to keep the cpu load light. This is the type of thing better suited to a server-side process. I would recommend "fixing" this issue by simply ignoring the "JSON Enabled" setting on the MQTT module along with appropriate documentation as to why.

I'm currently writing a Typescript container that will pull messages off MQTT, decode the protobufs and push them into Postgres. This is really the idea solution here.

Kollensperger commented 4 months ago

For those using node red, there is a good implementation for the decode and decrypt part which works really well and has become quite user friendly - https://github.com/meshtastic/node-red-contrib-meshtasticThe only thing missing atm is the encode option, hopefully this will come in the future. Once I moved from JSON to protobuf this way, my rak4631 with Ethernet module started working flawlessly. On 25 Feb 2024, at 08:08, ruswerner @.***> wrote: I frustratingly ran into this issue and by chance found mention of it on the forums. It would be great to explicitly mention this on the RAK4631 hardware page in the docs and also on the MQTT page in the docs. It would be great for JSON to "just work" on the nRF processor, but I actually think it's better that it doesn't in order to keep the cpu load light. This is the type of thing better suited to a server-side process. I would recommend "fixing" this issue but simply ignore the "JSON Enabled" setting on the MQTT module along with appropriate documentation as to why. I'm currently writing a Typescript container that will pull messages off MQTT, decode the protobufs and push them into Postgres. This is really the idea solution here.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>