meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
2.99k stars 715 forks source link

Fix MQTT queue use-after-free #3971

Open mskvortsov opened 1 month ago

mskvortsov commented 1 month ago

There is an issue with the MQTT queue implementation. Enqueued ServiceEnvelope items contain fields pointing to objects that may be deallocated in another thread before the fields are accessed. This leads to the publishing of garbage.

One approach would be just encapsulating the fields (MeshPacket and both strings) into a new QueueItem structure by value. However, since ServiceEnvelope seems to be used only in the context of MQTT.cpp, changing field annotation at the protobuf level also seems reasonable.

Requires https://github.com/meshtastic/protobufs/pull/503.

mskvortsov commented 1 month ago

An example of the failure in the wild

==73485== Invalid read of size 4
==73485==    at 0x238AB4: MQTT::meshPacketToJson[abi:cxx11](_meshtastic_MeshPacket*) (MQTT.cpp:889)
==73485==    by 0x23561F: MQTT::publishQueuedMessages() (MQTT.cpp:464)
==73485==    by 0x23535F: MQTT::runOnce() (MQTT.cpp:420)
==73485==    by 0x1B4A57: concurrency::OSThread::run() (OSThread.cpp:85)
==73485==    by 0x274ECF: ThreadController::runOrDelay() (ThreadController.cpp:59)
==73485==    by 0x1E6547: loop (main.cpp:1014)
==73485==    by 0x293E47: main (main.cpp:209)
==73485==  Address 0x7560a78 is 280 bytes inside a block of size 312 free'd
==73485==    at 0x4887B40: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==73485==    by 0x20E79F: MemoryDynamic<_meshtastic_MeshPacket>::release(_meshtastic_MeshPacket*) (MemoryPool.h:64)
==73485==    by 0x209C03: RadioLibInterface::completeSending() (RadioLibInterface.cpp:334)
==73485==    by 0x209B7B: RadioLibInterface::handleTransmitInterrupt() (RadioLibInterface.cpp:319)
==73485==    by 0x20981B: RadioLibInterface::onNotify(unsigned int) (RadioLibInterface.cpp:228)
==73485==    by 0x1B4633: concurrency::NotifiedWorkerThread::checkNotification() (NotifiedWorkerThread.cpp:82)
==73485==    by 0x1B465B: concurrency::NotifiedWorkerThread::runOnce() (NotifiedWorkerThread.cpp:89)
==73485==    by 0x1B4A57: concurrency::OSThread::run() (OSThread.cpp:85)
==73485==    by 0x274ECF: ThreadController::runOrDelay() (ThreadController.cpp:59)
==73485==    by 0x1E6547: loop (main.cpp:1014)
==73485==    by 0x293E47: main (main.cpp:209)
==73485==  Block was alloc'd at
==73485==    at 0x48850C8: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==73485==    by 0x20E7C3: MemoryDynamic<_meshtastic_MeshPacket>::alloc(unsigned int) (MemoryPool.h:71)
==73485==    by 0x20A2BF: Allocator<_meshtastic_MeshPacket>::allocZeroed(unsigned int) (MemoryPool.h:28)
==73485==    by 0x20A257: Allocator<_meshtastic_MeshPacket>::allocZeroed() (MemoryPool.h:18)
==73485==    by 0x20D3FF: Router::allocForSending() (Router.cpp:116)
==73485==    by 0x21BC6F: SinglePortModule::allocDataPacket() (SinglePortModule.h:34)
==73485==    by 0x229FFB: ProtobufModule<_meshtastic_Telemetry>::allocDataProtobuf(_meshtastic_Telemetry const&) (ProtobufModule.h:45)
==73485==    by 0x229D6B: DeviceTelemetryModule::sendTelemetry(unsigned int, bool) (DeviceTelemetry.cpp:93)
==73485==    by 0x229A6F: DeviceTelemetryModule::runOnce() (DeviceTelemetry.cpp:24)
==73485==    by 0x1B4A57: concurrency::OSThread::run() (OSThread.cpp:85)
==73485==    by 0x274ECF: ThreadController::runOrDelay() (ThreadController.cpp:59)
==73485==    by 0x1E6547: loop (main.cpp:1014)
thebentern commented 1 month ago

Couldn't we check if the pointers are valid instead of converting these field to value via the protobuf annotations?

mskvortsov commented 1 month ago

Reference counting for the MeshPacket objects in the base allocation pool could be an option. That’s a bit more invasive change though.

mskvortsov commented 1 month ago

I've experimented with adding a hidden refCount field to meshtastic_MeshPacket (hidden in the sense of existing only at C++ code level) and actually freeing up the packet from packetPool only if the counter becomes 1. Unfortunately, this approach done right requires some additional support from Nanopb: at the moment there is no good way to extend message A to A with an auxiliary field added so that other message B encapsulating A will also use this extended A.

So I would stick with the solution that was originally proposed.