meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.52k stars 869 forks source link

[Bug]: Heap-Leak in RangeTestModule #2228

Closed karamo closed 1 year ago

karamo commented 1 year ago

Category

Other

Hardware

T-Beam

Firmware Version

2.0.13

Description

One way to increase the sending rate of messages is using the RangeTestModule.

blue: WZ1 = Sender

meshtastic --port com16 --set range_test.enabled true --set range_test.save false
meshtastic --port com16 --set range_test.sender 20

red: TEST = Receiver meshtastic --port com8 --reboot

magenta only for reference Issue_3

Result: almost 4 times larger heap leak Stands in contrast to tx_enabled false issue #2227

Relevant log output

No response

karamo commented 1 year ago

To check the effect ... RangeTest disabled and: meshtastic --port com16 --set position.position_broadcast_secs 20

Issue_4

320 POSITION_APP in 6440" = 20" / POSITION Result: does not affect the Heap-derating

rpsainio commented 1 year ago

@karamo Is it possible to have some statistics as the debug output gives out also the module like below: DEBUG | 16:07:52 1016 [Router] ++++++ Thread Router freed heap 112752 -> 113040 (288) ++++++ I recall that you use MQTT for transport; even there you can extend the content

karamo commented 1 year ago

@rpsainio yes, I can repeat the test. What exactly would you see? The whole DEBUG-Log betweend two reboots?

rpsainio commented 1 year ago

@karamo If you can Linux, then maybe so: grep -a "\-\-\-\-\-\- Thread " name_of_the_file_log grep -a "\+\+\+\+\+\+ Thread " console_04.txt

karamo commented 1 year ago

My own USB-Monitor-Program will do the job. Test ist running. I'll send the DEBUG_LOG then here.

rpsainio commented 1 year ago

Send the LOG - fine as well

karamo commented 1 year ago

Here ist the full LOG. Time is UTC, FW 2.0.13+my MQTT-Mods RangeTest_LOG_tbeam.zip

rpsainio commented 1 year ago

Quick and dirty:

!/usr/bin/env python

DEBUG | ??:??:?? 2 [Screen] ++++++ Thread Screen freed heap 90420 -> 90440 (20) ++++++

DEBUG | ??:??:?? 11 [WebServerThread] ------ Thread WebServerThread leaked heap 70644 -> 70492 (-152) ------

import sys

if name == 'main': heap_items = {} with open(sys.argv[1], 'r') as f: for line in f.readlines():

get the name of the module

       mod_start = line.find("[")
       mod_end = line.find("]")
       value_start = line.find("(")
       value_end = line.find(")")
       if(mod_start > 0  and mod_end > 0  and value_start >0 and value_end>0):
           module = line[mod_start+1:mod_end]
           value = int(line[value_start+1:value_end]) 
           #print("%s %d" % (module, value))
           # add to dictionary
           if (not module in heap_items.keys()):
               heap_items[module]=value
           else:
               heap_items[module]+=value 
       else:
           print("Not a proper line %d %d %d %d %s" % (mod_start,mod_end,value_start,value_end,line))
   sum = 0          
   for module,value  in heap_items.items():
       sum += value
       print("%s %d" %(module, value))
   print("\nTotal heap used:%d" % ( sum)) 

./heap_summary.py MemoHeap.txt Not a proper line -1 -1 -1 -1 DEBUG | ??:??:?? 0 ======================================================================= Not a proper line -1 -1 -1 -1 DEBUG | ??:??:?? 0 ======================================================================= Not a proper line 20 26 -1 -1 ====== heap too low [11000] -> reboot in 1s ====== Not a proper line -1 -1 -1 -1 DEBUG | ??:??:?? 0 ======================================================================= Not a proper line -1 -1 -1 -1 DEBUG | ??:??:?? 0 =======================================================================

Screen -6228 RangeTestModule -96736 WifiConnect -380 PowerFSM -4488 RadioIf -45364 Router 60436 Blink -7048 Button 9056 WebServerThread 412256 DeviceTelemetryModule -4600 mqtt -242880 Power -300688 NodeInfoModule -2680 PositionModule -2252 GPS 18556 AirTime 48 ApiServer -104 SerialConsole -6884

Total heap used:-219980

So, lies, big lies, statistics - Do not know why WebServerThread frees up so much. I even grepped the file to see that. Maybe you can spot a programming error here...

karamo commented 1 year ago

@rpsainio your program is ok, but this type of evaluation that you make is inadmissible. The plus and minus values for a given thread are never the same. See my simple evaluation with Excel. MemoHeap_xls.zip

rpsainio commented 1 year ago

@karamo Too bad, still worth trying. I expected to see some threads sucking heap and others being more balanced as I saw that threads were releasing the same amount as they allocated: DEBUG | 18:16:13 127 [AirTime] ++++++ Thread AirTime freed heap 62536 -> 62560 (24) ++++++ DEBUG | 18:26:35 749 [AirTime] ++++++ Thread AirTime freed heap 41372 -> 41396 (24) ++++++ DEBUG | 18:27:11 785 [AirTime] ++++++ Thread AirTime freed heap 40128 -> 40152 (24) ++++++ DEBUG | 18:28:57 891 [AirTime] ++++++ Thread AirTime freed heap 35132 -> 37456 (2324) ++++++ DEBUG | 18:37:04 1379 [AirTime] ------ Thread AirTime leaked heap 27088 -> 24764 (-2324) ------ DEBUG | 18:39:22 1517 [AirTime] ------ Thread AirTime leaked heap 23212 -> 23188 (-24) ------

This one is +48

karamo commented 1 year ago

In my opinion, the DEBUG in OSThread.cpp does not represent the relationship between thread and heap. OSThread::run() takes only samples

caveman99 commented 1 year ago

In my opinion, the DEBUG in OSThread.cpp does not represent the relationship between thread and heap. OSThread::run() takes only samples

OSThread::run() is the executor for the runOnce part of every thread. if you measure heap before and after you get a pretty good measurement of what happend in between. The instances that allocate memory just to have it freed by another thread is passing buffers most of the time... as long as everyone cleans up data if it is not needed any more there should be no heap decrease.

karamo commented 1 year ago

RangeTest heap leak still available even with FW 2.0.18 - green line. Orange Node is for reference with "old" FW 2.0.17 RangeTest-1

karamo commented 1 year ago

PR #2245 was the beginning ... Then I have localized it in src\modules\esp32\RangeTestModule.cpp

void RangeTestModuleRadio::sendPayload(NodeNum dest, bool wantReplies)
{
    meshtastic_MeshPacket *p = allocDataPacket();

That line creates the heap leak. Need help to release the allocated Heap.

rpsainio commented 1 year ago

One could just allocate/deallocate outside sendPayload and pass it to sendPayload as the function does not know if it gets called again (ever) that would save excess allocation. OTOH, allocating and deallocating every time within function gets called lessens the risk of heap getting defragmented as there is more symmetry,

karamo commented 1 year ago

@rpsainio ok ... hmm ... ich lese deine Worte ... klingen für mich logisch und verständlich ... aber ich kann das nicht umsetzen. Hi Hi ...

now in english: OK... Hmm... I read your words... sound logical and understandable to me ... but I can't do that. My domain is not C/C++ ... my domain is programming in Delphi.

rpsainio commented 1 year ago

Well, if you found the problem, then you have done your part of the job, I know that devops expects more, but sometimes it is better to good at something and not mediocre on everything. I have to look how they malloc and free in this solution.

rpsainio commented 1 year ago

Ahh, Delphi - honestly I had to look it up again, Borland, Object Pascal and Pascal brings back some memories... Oh well, there should be a way to mark Off-Topic replies

caveman99 commented 1 year ago

i'm at it - there appears to be something weird with the queuestatus code.

rpsainio commented 1 year ago

Wading through the code shows that since packets get queued up and sometime later sent, this is really send and pray - that somebody down there does packetPool.release(p). Kudos to caveman99 when he solves this one.

mc-hamster commented 1 year ago

Wading through the code shows that since packets get queued up and sometime later sent, this is really send and pray - that somebody down there does packetPool.release(p). Kudos to caveman99 when he solves this one.

5 minute packet ttl in the queue.

rpsainio commented 1 year ago

@karamo do you happen to remember when you uploaded the MemoAllInput.txt if the RangeTest run all the time or only part of the time.

karamo commented 1 year ago

@rpsainio No, but I can repeat it. I will flash all my nodes with FW 2.0.18 today an then I can test it with DEBUG via USB.

karamo commented 1 year ago

With the help of Manuel @mverch67 we were able to follow the path of the message, starting from void RangeTestModuleRadio::sendPayload(NodeNum dest, bool wantReplies) until the successful packetPool.release(p); and also confirm that no heap leak occurs in this way. According to the rule: "Look there too where you don't expect it!" I was then able to make a small change in

src\modules\esp32\RangeTestModule.cpp
void RangeTestModuleRadio::sendPayload( ...

RangeTest-ok

The result is that a heap leak no longer occurs (blue line).

karamo commented 1 year ago

I solved it just for the RangeTestModule by set: p->want_ack = false; Not strictly required in the RangeTestModule.

garthvh commented 1 year ago

I solved it just for the RangeTestModule by set: p->want_ack = false; Not strictly required in the RangeTestModule.

The ACK is required for the Range Test Module functionality, it provides the SNR and RSSI data in the CSV file.