u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
287 stars 82 forks source link

stm32u5 example hanging #256

Closed mikehealy closed 1 week ago

mikehealy commented 2 weeks ago

Hi,

Using the preview_platform_stm32u5_rmea branch I am trying to get an example running with an NUCLEO-U575ZI-Q board connected to a LEXI-R520, using FreeRTOS. The application appears to be getting stuck in preambleHeapDefence. See attached output.txt.

I'm new to ubxlib, so I'm probably doing something wrong/silly here.

The only changes I've made after checking out the preview_platform_stm32u5_rmea branch are to u_cfg_app_platform_specific.h (see attached u_cfg_app_platform_specific.diff.txt) to reflect how things are wired up, and to the runner_freertos/Makefile (see attached Makefile.diff.txt)

My build command, from within the ubxlib/port/platform/stm32cube/mcu/stm32u5/runner_freertos folder is:

make STM32_CMSIS_FREERTOS_PATH=/home/mgh/github/STMicroelectronics/x-cube-freertos UNITY_PATH=/home/mgh/github/ThrowTheSwitch/Unity ARM_GCC_TOOLCHAIN_PATH=/opt/st/stm32cubeclt_1.15.1/GNU-tools-for-STM32/bin.

Digging deeper, I see that within preambleHeapDefence it is getting as far as uPortDeinit() (line 133).

In turn, within uPortDeinit() it is uPortEventQueuePrivateDeinit() (line 269) that is proving problematic.

Within uPortEventQueuePrivateDeinit(), the first time through the for loop, i.e. x == 0, eventQueueFree() is never returning.

Within eventQueueFree(), it's uPortQueueSend() (line 175) that is never returning (first attempt to call it).

In uPortQueueSend() it is gettting as far as osMessageQueuePut() (line 308).

I'm using the current main branch of x-cube-freertos (so v1.1.0). In osMessageQueuePut() it is getting to xQueueSendToBack() (line 2176).

xQueueSendToBack() in turn leads to xQueueGenericSend(). From this point it is getting a bit tricky to keep digging (because xQueueGenericSend() seems to be getting called fairly frequently).

Any ideas as to what might be going wrong? Do I have a build environment or configuration problem or ...?

RobMeades commented 1 week ago

Hi, thanks for posting, and sorry you're having trouble with this.

It is really weird: obviously this is a relatively fundamental thing. If osMessageQueuePut() is not returning, that might suggest that somehow FreeRTOS thinks the queue is full but nothing will have happened at this point, as you can see preambleHeapDefence() is the very first "test" that is run, and the uPortDeinit() call is right at the start of it; uPortDeinit() should be pushing on an open door when shutting things down, there will be nothing on any queues.

We tested the preview branch with the same version of x-cube-freertos as you, v1.1.0, so it seems unlikely to be a version thing.

I can't see anything wrong with your build command-line or your file changes but a first step, if you haven't tried this already, might be to run without the file changes and see if execution gets past this point; the fact that the pins etc. don't line up with reality shouldn't matter up to here, at least. I'd guess this will turn out the same way but it is worth eliminating as a possibility.

Just for reference, find attached the log output from the target when we last ran this ourselves.

I'm a bit stuck as to what else to suggest. Maybe something will come to me overnight...

RobMeades commented 1 week ago

This isn't going to be helpful at all but I have a very vague memory of having a problem of this nature (i.e. an unexplained failure to run quite early on) when starting the integration of STM32U5 and it was due to the clock/tick not running.

I don't believe there are any jumpers of significance on that board but it might be worth, for instance, adding an infinite busy loop or some such just before the uPortDeinit() call and seeing if it can run for, say a second or two and then exit, or if it is simply time that is your enemy, IYSWIM.

mikehealy commented 1 week ago

Hi Rob,

Thanks for your replies.

if you haven't tried this already, might be to run without the file changes and see if execution gets past this point;

I tried this, but I'm not seeing any difference. I.e. it is still only getting as far as: U_APP: Running preambleHeapDefence...

I don't believe there are any jumpers of significance on that board...

I tried this with two different NUCLEO-U575ZI-Q boards, one brand new out of the box, but same behavior on both. Another developer here also repeated it, with his own build environment and hardware, and he is also seeing the same thing happen.

adding an infinite busy loop or some such just before the uPortDeinit() call and seeing if it can run for, say a second or two and then exit, or if it is simply time that is your enemy, IYSWIM.

I'm not entirely sure I follow what you're asking me to try here. What I did do is add the following code to run just before the call to uPortDeinit() (line 129 here):

    for(int i = 0; i <= 10; i++) {
       U_TEST_PRINT_LINE("\t i = %d", i);
       uPortTaskBlock(1000);
    }

which gives me the expected output (but still gets no further):

U_APP: running all functions.

U_APP: Running preambleHeapDefence...
U_PREAMBLE_TEST:     i = 0
U_PREAMBLE_TEST:     i = 1
U_PREAMBLE_TEST:     i = 2
U_PREAMBLE_TEST:     i = 3
U_PREAMBLE_TEST:     i = 4
U_PREAMBLE_TEST:     i = 5
U_PREAMBLE_TEST:     i = 6
U_PREAMBLE_TEST:     i = 7
U_PREAMBLE_TEST:     i = 8
U_PREAMBLE_TEST:     i = 9
U_PREAMBLE_TEST:     i = 10

While I didn't measure it too accurately, from a quick stopwatch eyeballing the call to uPortTaskBlock(1000) does appear to be delaying for approximately one second, which, I'm assuming, seems to indicate that the clock/tick is running OK.

We tested the preview branch with the same version of x-cube-freertos as you, v1.1.0, so it seems unlikely to be a version thing.

As a sanity check, rather than cloning from github, I also tested using the x-cube-freertos packs downloaded using STM32CubeMX. Using version 1.1.0 and 1.2.0 from here shows the same results.

So, ruling out x-cube-freertos (for now), another variable in the mix is the toolchain. I'm currently using the version included with STM32CubeCLT v1.15.1. However, when building I notice the following warning (see full build log attached):

/home/mgh/github/u-blox/ubxlib/port/platform/stm32cube/src/heap_useNewlib.c:71:2: warning: #warning "This wrapper was verified for newlib version 3.3.0; please ensure newlib's external requirements for malloc-family are unchanged!" [-Wcpp]
   71 | #warning "This wrapper was verified for newlib version 3.3.0; please ensure newlib's external requirements for malloc-family are unchanged!

Is this warning expected? The version of newlib included in the STM32CubeCLT v1.15.1 is 4.3.0. Can you tell me what toolchain version you were using to test, and/or where to find it, so that I can try that?

Thanks

RobMeades commented 1 week ago

tried this, but I'm not seeing any difference. I.e. it is still only getting as far as U_APP: Running preambleHeapDefence

As expected I think, but it was worth a try.

While I didn't measure it too accurately, from a quick stopwatch eyeballing the call to uPortTaskBlock(1000) does appear to be delaying for approximately one second, which, I'm assuming, seems to indicate that the clock/tick is running OK.

That sounds like the clock is ticking then.

Just to double-check, I have pulled the preview branch back from here onto a fresh branch locally and pushed it to our test system and that still turns out fine, build and [initial part of] run logs attached.

What can be the difference?

/home/mgh/github/u-blox/ubxlib/port/platform/stm32cube/src/heap_useNewlib.c:71:2: warning: #warning "This wrapper was verified for newlib version 3.3.0; please ensure newlib's external requirements for malloc-family are unchanged!"

This is just a paranoid printf() in case any newlib internals have changed between versions, I would have thought it relatively unlikely to be an issue. Mind you, I'd say the same about GCC version but if we're running out of options it might be worth trying the same version as we use: we build with arm_embedded_gcc-10-2020-q4-major:

https://developer.arm.com/-/media/Files/downloads/gnu-rm/10-2020q4/gcc-arm-none-eabi-10-2020-q4-major-win32.zip https://developer.arm.com/-/media/Files/downloads/gnu-rm/10-2020q4/gcc-arm-none-eabi-10-2020-q4-major-x86_64-linux.tar.bz2 https://developer.arm.com/-/media/Files/downloads/gnu-rm/10-2020q4/gcc-arm-none-eabi-10-2020-q4-major-aarch64-linux.tar.bz2

EDIT: find attached a .zip file containing the .elf file that you can see running in the attached build and run files; you could maybe try downloading that to your board? Obviously the pins and the module type will be wrong but it might be interesting to see if it gets past the preamble.

mikehealy commented 1 week ago

What can be the difference?

It seems the toolchain is the culprit.

When I build using the arm_embedded_gcc-10-2020-q4-major version (i.e. matching what you are using) it seems to run OK (see attached log: exampleHttpClient_log.txt). I do see that the final postambleResourceCheck test is failing, but I hope that's not too big a deal for now.

So the next question is why is the ST Micro toolchain causing problems? The ST Micro toolchain is a patched version of the ARM 12.3.1 release.

As a sanity check I grabbed a copy of the official 12.3.1 release from ARM (here), and I'm seeing the same behaviour as when using the ST Micro toolchain (i.e. the application gets as far as U_APP: Running preambleHeapDefence... and goes no further).

We have a strong preference for sticking with the ST Micro toolchain (this is part of a large project, and all of the partners made the decision to standardize on this version of the ST Micro toolchain early on) so do you have any thoughts about what might be going on here and/or what to do about it?

Thanks

RobMeades commented 1 week ago

Well found! I never would have expected that. We have a ticket open to move to a later GCC version, which would be GCC ARM 12.3.1 I guess. I do have a spare STM32U5 board that I can grab and set up the debugger etc., will do that now and see what gives...

RobMeades commented 1 week ago

Update: I've now tried a local build with the latest version of the GCC ARM toolchain from:

https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads

...which turns out to be 13.2.rel1, as well as with the version you are using, 12.3.rel1. I've been single-stepping carefully through execution with the debugger in the Eclipse IDE to find out what is going wrong and everything appeared to be working fine, until I looked at the trace output, which stops dead exactly as you have seen:

...
U_APP: 121: postambleResourceCheck[postamble]

U_APP: running functions that begin with "port".

U_APP: Running preambleHeapDefence...

In other words, the code is executing but all trace output is stopped by something or other that occurs in the preambleHeapDefence() function. Specifically, [successfully] executing the entirety of this:

void uPortDeinit()
{
    if (gInitialised) {
        printf("Alive 1\n");
        uPortUartDeinit();
        printf("Alive 2\n");
        uPortPrivateDeinit();
        printf("Alive 3\n");
        uPortEventQueuePrivateDeinit();
        printf("Alive 4\n");
        gInitialised = false;
    }
}

...results in this:

...
U_APP: 121: postambleResourceCheck[postamble]

U_APP: running functions that begin with "port".

U_APP: Running preambleHeapDefence...
Alive 1
Alive 2
Alive 3

No "Alive 4". Will continue to dig.

RobMeades commented 1 week ago

So the problem lies here:

https://github.com/u-blox/ubxlib/blob/ab58361971115d6287c33139dcd1dc895caa5e12/port/platform/stm32cube/src/u_port_os_pure_cmsis.c#L166-L186

As you can see, this code was added to fix a memory leak on task deletion resulting from an interaction between the way newlib (delivered with GCC ARM) is compiled and the way FreeRTOS ensures thread-safety/re-entrancy. If you're interested you can see details of the discussion four years ago here (skip to the bottom of the thread for a summary from one of my colleagues who did the investigation):

https://forums.freertos.org/t/making-sure-a-deleted-tasks-os-clib-resources-have-been-freed/11132/28

Between compiler versions (I've also tried 11.3.1 now and it shows the same thing) something has changed which means that doing the fclose() calls as above stops printf() from working thereafter. Removing the chunk of code above means printf() works, what I haven't yet been able to verify is whether there is a memory leak or not as a result (and if not how exactly to make the change conditional so as not to break ARM GCC 10 operation).

As a temporary measure, just to get things moving, I suggest you delete the above chunk of code locally, then you can move on to other things. I will update this issue when I've checked for leaks.

RobMeades commented 1 week ago

I have now verified that whatever they did in building newlib for versions of GCC ARM greater than 10 (or, for that matter, whatever newlib did in version 4, can't be sure), removing the workaround we added back in 2021 (a) does fix the problem that printf()s stop at that point and (b) does not leave us with a memory leak when a task is deleted.

Since preview_platform_stm32u5_rmea is very old now, I have rebased it on our latest master (where the newlib related fix has been applied) and force-pushed it here once more. The original preview branch is still here for now in case you need it but renamed to preview_platform_stm32u5_backup_rmea.

When you have time, please delete your old copy of the preview branch and pull the new version; this has passed our testing with ARM GCC 13.2.1 and hopefully it will work for you.

Apologies for all this preview stuff, it is extremely annoying, but we can't put these changes on master until ST provide a fix for the ThreadX memory leak issue. Once they have provided a fix and we have tested it we will merge the changes to master and, some time after that, delete the preview branches.

mikehealy commented 1 week ago

I've tested the latest preview_platform_stm32u5_backup_rmea branch using the ST Micro 12.3.1 toolchain and I can confirm that I am no longer encountering the problem. Thanks for your help!

The original preview branch is still here for now in case you need it but renamed to preview_platform_stm32u5_backup_rmea.

No need to keep this backup branch around on our account, as I can't see it being needed at any stage.

I notice that postambleResourceCheck... is still failing:

U_APP: Running postambleResourceCheck...
U_POSTAMBLE_TEST: main task stack had a minimum of 6532 byte(s) free (minimum is 5120).
U_POSTAMBLE_TEST: heap had a minimum of 725352 byte(s) free (minimum is 512000).
U_POSTAMBLE_TEST: *** ERROR *** expected 1 outstanding OS resource(s) (tasks etc.) but got 2; they might yet be cleaned up.
U_POSTAMBLE_TEST: too many resources outstanding.
/home/mgh/github/u-blox/ubxlib/port/platform/common/test/u_postamble_test.c:97:postambleResourceCheck:FAIL: Expression Evaluated To FALSE

Is this anything to be concerned about? If not I think this issue can be closed now.

RobMeades commented 1 week ago

Phew, thanks for confirming, I will delete the _backup preview branch next week.

I notice that postambleResourceCheck... is still failing

I would probably ignore it for now: sometimes when only a subset of tests/examples are being run they don't quite do the same thing as a complete set, e.g. a test might leave resources in place in order to speed up the transition between tests (a full test run is now close to 90 minutes; we'd like it to be more like 30), but if subsequent tests have been filtered out the clean-up will be have been missed.

Better to move on with your intended application I think.

I will close this issue; please feel free to re-open it, or of course open another issue, if there is more to discuss.