maxgerhardt / platform-raspberrypi

Raspberry Pi: development platform for PlatformIO
Apache License 2.0
94 stars 46 forks source link

Serial Print Deadlock with Multicore #7

Closed savejeff closed 2 years ago

savejeff commented 2 years ago

Hi,

Ive found a Problem that seems to only happen with the PIO core and not with the Arduino Version.

The original issue (here) I opened in the earlephilhower/arduino-pico was closed.

I tested the code on multiple Windows 10 PCs with the same result

Here the relevant code. I've reduced the Pio project to its minimum Demo_RP2040_Deadlock_Clean.zip

and this is the ulf that i get and doesn't work: firmware.zip

As it works with Arduino IDE and not with PIO, might this be caused by optimization?

maxgerhardt commented 2 years ago

Thanks, I will check out it later in the debugger on why it locks up / what state it is in.

savejeff commented 2 years ago

That would be awesome thanks! I although that about using the Pico Probe to check what actually happens here, but i've not yet used it. The mbed core i used so far did not support it. if the arduino_pico core supports the pico probe well that would be another bonus.

maxgerhardt commented 2 years ago

In the new platform integration (https://github.com/platformio/platform-raspberrypi/pull/36), PicoProbe debugging works with both ArduinoCore-mbed and Arduino-Pico as the used Arduino core. And I've also tested that this works with both cores, so if one core hits a breakpoint (or I just press the pause button), I see the exact execution state of both cores.

savejeff commented 2 years ago

That sounds very good. I think I'll look into it over the weekend.

earlephilhower commented 2 years ago

I can repro the failure locally using PIO and the latest integration (just replaced the platform.ini with the lines given in that PR).

Both the RAM and flash sizes are different from when built in the IDE under any optimization option. I would expect flash sizes to differ w/optimization, but the RAM used normally does not change as those are simply global vars which are invariant to optimization levels.

I'm still learning PIO so I'll try to get the actual compile and link commands and see what differences I find (but no guarantees...only times I've used P.IO is for building Marlin and it was just "hit return")

savejeff commented 2 years ago

I can really recommend PIO. I love it more every day I use it.

I'll try to get into step-by-step debugging with SWD and Pico Probe. maybe ill find the spot where it gets stuck.

@maxgerhardt what are the default optimization settings in PIO ? maybe the mutex's are copied inline. but that doesn't really explain the freez

maxgerhardt commented 2 years ago

What I see with the debugger: After 2 iterations, the second core gets into a hardfault while trying to de-allocate a String object.

grafik

Not yet sure why.

Maybe the String gets used beyond the time the log function is left (there is no Serial.flush() so it's probably still in its ringbuffer), or something else..

maxgerhardt commented 2 years ago

The faulting instruction is a load-into-register instruction with the source being r7 + 4, but the r7 register is pointing at 0x4db85158, is that even valid RAM?

grafik

Heap corruption when the heap is concurrently used by e.g. the String functions?

Are heap free() and malloc() function guarded by a mutex and threadsafe?

Well GDB definitely can't read from that memory address..

x/1x 0x4db85158
Info : SWD DPIDR 0x0bc12477
Info : SWD DLPIDR 0x10000001
Error: Failed to read memory at 0x4db85158
earlephilhower commented 2 years ago

Some GOOG and some tr later, I found the difference.

The PIO build is running without any default optimization. The IDE uses -Os by default.

All tested optimization levels work in the IDE except for -O which is, IIRC, the GCC default when none is specified.

So, looks to be not PIO related, just GCC optimization and the core related. No idea what specifically at this point, but a simple workaround: add -Os to the build_flags = line.

earlephilhower commented 2 years ago

Also, the RAM and FLASH size differences are not real, just PIO accounting for things differently than the IDE's -size does. So no real worries other than the missing optimizaiton level and potential issue in the core itself (i.e. not needed to be tracked here)

maxgerhardt commented 2 years ago

I can confirm that with the latest version, by default there is indeed no -Os line. Not sure how I could have overlooked that -- though I think someone told me about it once.

arm-none-eabi-g++ -o .pio\build\pico\FrameworkArduino\main.cpp.o -c -std=gnu++17 -fno-exceptions -fno-rtti -Werror=return-type -march=armv6-m -mcpu=cortex-m0plus -mthumb -ffunction-sections -fdata-sections -DPLATFORMIO=60002 -DARDUINO_RASPBERRY_PI_PICO -DARDUINO_ARCH_RP2040 -DUSBD_MAX_POWER_MA=500 -DARDUINO=10810 -DARDUINO_ARCH_RP2040 -DF_CPU=133000000L -DBOARD_NAME=\"pico\" -DCFG_TUSB_MCU=OPT_MCU_RP2040 -DUSB_VID=0x2e8a -DUSB_PID=0x000a "-DUSB_MANUFACTURER=\"Raspberry Pi\"" -DUSB_PRODUCT=\"Pico\" -DSERIALUSB_PID=0x000a [...] C:\Users\Max.platformio\packages\framework-arduinopico\cores\rp2040\main.cpp

(Cut out includes for brievity)

savejeff commented 2 years ago

I can confirm that the first tests with build_flags = -O2 do run without a deadlock. it looks promising.

I'll run some more tests later!

maxgerhardt commented 2 years ago

Patched by https://github.com/earlephilhower/arduino-pico/pull/619.

However, it leaves a bad feeling: When compiling with -O0 or -Og (which it btw by default does when you start a debug session in PlatformIO), then there is a possible race in the free routine of the heap?

Default optimization is correct and should be done, but there might be a sleeper cell here that might blow up later. With optimization it might only reduce the chances of it happening, not fully eliminating it.

maxgerhardt commented 2 years ago

The String dealloc happens in jtask1, likely after the LOG_FUNC has been exited. The log function accepts the String by reference. Maybe it would make a difference if it accepted it by copy and the delete was mutex protected? Hm.

savejeff commented 2 years ago

If i understand correctly: The String variable becomes out of scope and is deleted, but the Serial write function is still using the object (and not a copy of it)

Every copying in the logfunction would thus only move the scope around a bit but change not the problem itself

earlephilhower commented 2 years ago

If i understand correctly: The String variable becomes out of scope and is deleted, but the Serial write function is still using the object (and not a copy of it)

No, that can't be the case here. The Serial.write/etc. calls are all synchronous. For the USB code, it's just copying the string to the USB buffers and returning. You can free/delete the data you sent in any time after the call. A copy of the data is already stuffed in the USB transmit buffers at that point.

I think @maxgerhardt was thinking there might be a multithreaded unsafe malloc/free implementation so that the memory manager could end up giving out bad addresses for the newly allocated Strings.

earlephilhower commented 2 years ago

In any case, let's move this back to https://github.com/earlephilhower/arduino-pico/issues/614 as it is not platform.io related.

earlephilhower commented 2 years ago

There really is a race condition that is unprotected. malloc and free are protected with __wrap mutex-protected functions.

realloc is not, though! There is no __wrap_realloc and hence 2 reallocs run at the same time and cause the heap to go crazy.

I need to see if I broke the pico-sdk or if I somehow missed including the wrapper in the release...

earlephilhower commented 2 years ago

This can be closed. Latest pico-sdk has the patch applied while the full patch is a PR in the RPi repo.

savejeff commented 2 years ago

Ah very nice! Thanks for the thorough search and analysis. I even gained some insight ;] Didn't think a simple "my print func causes freeze" would lead down this rabbit hole.

When will the latest pico-sdk be updated into the core?

earlephilhower commented 2 years ago

When will the latest pico-sdk be updated into the core?

The latest master has the patch already, and it'll go out to everyone on 1.3.2 (whenever the RPi team releases and assuming they accept it). Update and you should be good to go.