zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.73k stars 6.56k forks source link

`lpcxpresso55s69_ns` build error on `tests/arch/common/timing` about VFP register arguments #67751

Closed dcpleung closed 8 months ago

dcpleung commented 9 months ago

Describe the bug This was discovered in CI where lpcxpresso55s69_ns failed to build on tests/arch/common/timing:

zephyr-sdk-0.16.4/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/../../../../arm-zephyr-eabi/bin/ld: error: bin/tfm_s.axf uses VFP register arguments, secure_fw/partitions/crypto/mbedcrypto/3rdparty/p256-m/libcrypto_service_p256m.a(p256-m_driver_entrypoints.o) does not
zephyr-sdk-0.16.4/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/../../../../arm-zephyr-eabi/bin/ld: failed to merge target specific data of file secure_fw/partitions/crypto/mbedcrypto/3rdparty/p256-m/libcrypto_service_p256m.a(p256-m_driver_entrypoints.o)
zephyr-sdk-0.16.4/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/../../../../arm-zephyr-eabi/bin/ld: error: bin/tfm_s.axf uses VFP register arguments, secure_fw/partitions/crypto/mbedcrypto/3rdparty/p256-m/libcrypto_service_p256m.a(p256-m.o) does not
zephyr-sdk-0.16.4/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/../../../../arm-zephyr-eabi/bin/ld: failed to merge target specific data of file secure_fw/partitions/crypto/mbedcrypto/3rdparty/p256-m/libcrypto_service_p256m.a(p256-m.o)

If CONFIG_FPU is disabled, this builds without any errors. So my guess is that FPU related compiler flags are not consistently used throughout the build.

To Reproduce Steps to reproduce the behavior:

  1. west build -p -b lpcxpresso55s69_ns tests/arch/common/timing -T arch.common.timing

Expected behavior Should build without any errors.

Impact CI failing on PRs.

Environment (please complete the following information):

dleach02 commented 9 months ago

I bisected this down to the tf-m 2.0.0 update #66288 .

@mswarowsky, @d3zd3z

FYI, @butok

butok commented 9 months ago

On my side, trying to run the TFM psa_protected_storage example using:

$ west build -p always -b lpcxpresso55s69_ns samples/tfm_integration/psa_protected_storage
$ pyocd erase --mass -t LPC55S69
$ pyocd flash build/zephyr/tfm_merged.hex -t LPC55S69

Building is OK. But there is no any message in a console terminal.

ithinuel commented 9 months ago

I was able to reproduce the error matching the CI reports. Applying the patch at https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/25383 does fix the build issue. I was not able to run the tests though.

butok commented 9 months ago

Thanks. It should be something else. It's buildable but not functional.

ithinuel commented 9 months ago

@butok, are you referring to a different issue than what's in this thread?

butok commented 9 months ago

For me, this is the same issue. TFM is broken after Zephyr TFM update to v2.

ithinuel commented 9 months ago

I tried https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/25909 but I don't get an output either.

From what I understand there are 2 issues showing here.

I tried building samples/hello_world for lpcxpresso55s69_cpu0 and it works fine. However, lpcxpresso55s69_cpu1 does not print anything. Could lpcxpresso55s69_cpu1 and lpcxpresso55s69_ns have the same issue (that wouldn't necessarily be related to tf-m) ?

EDIT: I noticed that the dts for cpu0 and cpu1/ns are a bit different. Eg, the DMA configuration was «fixed» for cpu0 in https://github.com/zephyrproject-rtos/zephyr/pull/59767 but cpu1 and ns were not affected.

butok commented 9 months ago

cpu1 is not used by TFM, only cpu0.

butok commented 9 months ago

FYI:

butok commented 9 months ago

!Attention of TFMv2 integrators is required!

It must be a problem with the tool and/or configuration and/or cmake. This was discovered by analyzing the generated hex files and map files.

The Zephyr-NS is compiled as a non-TFM application (without going to the NS slot). It is located at the beginning of the flash memory. As the result, Zephyr-NS is flashed to the beginning of the flash, then overwitted by BL2 and TFM-S.

dleach02 commented 9 months ago

@mswarowsky, @d3zd3z can we get some review on this issue?

butok commented 8 months ago

Narrowing the issue: So the generated NS zephyr.hex starting address is correct 0x3000, but with the generated zephyr_ns_signed.hex the starting address was shifted to the beginning of the flash and the mistake inserted to the final tfm_merged.hex

BTW: There was found another Zephyr TFM issue. Zephyr TFM is signing S and NS images separately. But it should merge S + NS images to one image and only after that to sign it, as it is done in Upstream TFM. Please comment somebody, if this is a mistake or was done intentionally.

ithinuel commented 8 months ago

@butok using west build -p -b lpcxpresso55s69_ns samples/hello_world/ I get:

file start end comment
bl2.hex 0x1000_0000 0x1000_6D7F last 00 command starting at 6D64 with 12 bytes.
tfm_s.hex 0x1000_8400 0x1001_91AF last 00 command at 9198 with 16 bytes. There's a :020000041001E9 in the file.
tfm_s_signed.hex 0x1000_8000 0x1001_9551 last 00 command at 9550 with 1 byte. There's a :020000041001E9 in the file.
zephyr.hex 0x0003_0000 0x0003_4E3D last 00 command at 4E3A with 4bytes.
zephyr_ns_signed.hex 0x0003_0000 0x0003_51E7 last 00 command at 51E0 with 8 bytes.

tfm_merged.hex picks zephyr_ns_signed.hex, bl2.hex and tfm_s_signed.hex in this order at their respective range.

According to the LPC5S6x datasheet, 0x0-0x9_FFFF and 0x1000_0000-0x1009_FFFF are respectively the ranges for the non-secure and secure flash memory space. I haven't checked yet what is defined in the dts files but so far this seems correct to me.

Note: zephyr.hex uses the command 02 with the parameter 3000 while zephyr_ns_signed.hex uses the command 04 with the parameter 0003. Both result in the same address of 0x0003_0000 according to https://en.wikipedia.org/wiki/Intel_HEX#Record_types.

butok commented 8 months ago

S and NS should not be signed separately. It will not work. Signed NS is not checked. MCUBoot checks only one merged image. TFM S does not check NS image, so the signing chain is not implemented. Upstream TFM+BL2 is merging S+NS to one image, and only after that signs it.

ithinuel commented 8 months ago

I had a chat with our TF-M team and they pointed to me the parameter MCUBOOT_IMAGE_NUMBER (CONFIG_TFM_MCUBOOT_IMAGE_NUMBER in Zephyr) that’s controling the number of image and how they are signed (together or independently).

I see the default is 2, so The current behaviour seem to match that configuration’s default.

However, I tried building with -D CONFIG_TFM_MCUBOOT_IMAGE_NUMBER=1 and the build failed. I continue searching on that side.

butok commented 8 months ago

Good to know, missed this parameter. BTW: Interesting fact is that for Upstream TFM, even if MCUBOOT_IMAGE_NUMBER is set to 2, it generates tfm_s_ns_signed.bin, so it generates files for both cases 1 and 2.

butok commented 8 months ago

Rolled Zephyr commit to 17/01/24 (b8105c1408b28ce44af948572dc445e7ee01a6cc), before TFMv2 update. TFM is built with -DMCUBOOT_IMAGE_NUMBER=2 and it works.

ithinuel commented 8 months ago

I tracked the failure when forcing …IMAGE_NUMBER to 1 to https://github.com/zephyrproject-rtos/zephyr/issues/68345 I don't know what's the solution to this.

I am also not sure if that's related in anyway to the issue where no output comes from the device.

butok commented 8 months ago

I tracked the failure when forcing …IMAGE_NUMBER to 1 to #68345 I don't know what's the solution to this.

I am also not sure if that's related in anyway to the issue where no output comes from the device.

So MCUBOOT_IMAGE_NUMBER=1 is the common TFM issue. And the older TFM works with MCUBOOT_IMAGE_NUMBER=2, so the new TFM should also work with it.

BTW: Do we have run results for other platforms?

ithinuel commented 8 months ago

west build -p -b mps2_an521_ns samples/hello_world/ -D CONFIG_TFM_MCUBOOT_IMAGE_NUMBER=2 does run fine.

I get the same build failure with …_IMAGE_NUMBER=1 though. EDIT: I checked with our TF-M team and AFAIU, platforms are not tested with both. They would typically only support one of them and most recent platform only support 2.

d3zd3z commented 8 months ago

So, as the images that don't boot, they seem to be hanging in the function USART_Deinit:

(gdb) bt
#0  USART_Deinit (base=0x40086000)
    at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/platform/ext/target/nxp/common/Native_Driver/drivers/fsl_usart.c:279
#1  0x10002b9a in ARM_USARTx_Deinitialize (uart_dev=0x30000450 <USART_DEV>)
    at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c:121
#2  ARM_USART_Uninitialize ()
    at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c:189
#3  0x10000818 in do_boot (rsp=0x30000508 <rsp>)
    at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/bl2/ext/mcuboot/bl2_main.c:85
#4  main () at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/bl2/ext/mcuboot/bl2_main.c:191
#5  0x1000020e in _start ()

This is very early in startup, and is pointing to maybe an integration problem with the HAL.

d3zd3z commented 8 months ago

It is reading from base->STAT, expecting certain bits to be set, which never get set.

butok commented 8 months ago

It's interesting. For me, it is stuck in MCUBoot, bl2_main.c => Line 179, panic error "Unable to find bootable image". At the same time Zephyr MCUBoot (without TFMv2) is fully functional.

butok commented 8 months ago

It is reading from base->STAT, expecting certain bits to be set, which never get set.

It is set statically, in the trusted-firmware-m/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c, line 164:

static UARTx_Resources USART_DEV = {
    .base = USART_BASE,
};
ithinuel commented 8 months ago

@butok The base address of the uart is set statically indeed, but isn’t base->STAT a register?

butok commented 8 months ago

@butok The base address of the uart is set statically indeed, but isn’t base->STAT a register?

Yes, base is set by SW, but ->STAT is the USART status register.

d3zd3z commented 8 months ago

It does appear that if I remove the line that is hanging, this doesn't happen. I will continue to try to figure out what is going on here. The code appears to be calling the uart deinit before it inits it.

--- a/platform/ext/target/nxp/common/Native_Driver/drivers/fsl_usart.c
+++ b/platform/ext/target/nxp/common/Native_Driver/drivers/fsl_usart.c
@@ -276,9 +276,11 @@ void USART_Deinit(USART_Type *base)
 {
     /* Check arguments */
     assert(NULL != base);
+    /*
     while (0U == (base->STAT & USART_STAT_TXIDLE_MASK))
     {
     }
+    */
     /* Disable interrupts, disable dma requests, disable peripheral */
     base->FIFOINTENCLR = USART_FIFOINTENCLR_TXERR_MASK | USART_FIFOINTENCLR_RXERR_MASK | USART_FIFOINTENCLR_TXLVL_MASK |
                          USART_FIFOINTENCLR_RXLVL_MASK;
d3zd3z commented 8 months ago

So, the usart driver seems to be working, it just is having a problem waiting for the device to become idle before it does the deinit.

butok commented 8 months ago

Thank you very much @d3zd3z ! The issue is that the USART deinitialization is called without its initialization.

If to add the is_initialized flag to the TFM CMSIS driver to avoid this case, it works:

--- "a/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c"
+++ "b/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c"
@@ -38,6 +38,7 @@ typedef struct {
     usart_config_t  config;         /* USART configuration structure */
     uint32_t        tx_nbr_bytes;   /* Number of bytes transfered */
     uint32_t        rx_nbr_bytes;   /* Number of bytes recevied */
+    bool            is_initialized; /* true if initialized */
 } UARTx_Resources;

 /* Driver version */
@@ -87,13 +88,15 @@ static ARM_USART_CAPABILITIES ARM_USART_GetCapabilities(void)

 static int32_t ARM_USARTx_Initialize(UARTx_Resources* uart_dev)
 {
 #if (__ARM_FEATURE_CMSE & 0x2) /* Initialize once in S */
     uint32_t usartClkFreq;

     usartClkFreq = BOARD_DEBUG_UART_CLK_FREQ;

-    USART_Init(uart_dev->base, &uart_dev->config, usartClkFreq);
+    if (USART_Init(uart_dev->base, &uart_dev->config, usartClkFreq) == kStatus_Success) {
+        uart_dev->is_initialized = true;
+    }
 #endif

     return ARM_DRIVER_OK;
@@ -118,7 +121,10 @@ static int32_t ARM_USARTx_PowerControl(UARTx_Resources* uart_dev,

 static int32_t ARM_USARTx_Deinitialize(UARTx_Resources* uart_dev)
 {
-    USART_Deinit(uart_dev->base);
+    if(uart_dev->is_initialized) {
+        USART_Deinit(uart_dev->base);
+        uart_dev->is_initialized = false;
+    }

     return ARM_DRIVER_OK;
 }

Are you OK with this solution? May I add the PR to the Zephyr-TFM repository to speedup Zephyr fixing? Or do you prefer push it to the Upstream TFM first? I will wait for your decision.

Still there are two open questions:

butok commented 8 months ago

The PR has been submitted to the Upstream TFM: https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/26338

d3zd3z commented 8 months ago

At least initially, we should fix this here. I'm not sure what the right upstream fix is, since this code comes from the LPC SDK and not directly from TF-M. It may also be better to try and track down why TF-M does the deinit first. I believe it has to do with how the logging gets initialized, but it still seems weird.

butok commented 8 months ago

At least initially, we should fix this here. I'm not sure what the right upstream fix is, since this code comes from the LPC SDK and not directly from TF-M. It may also be better to try and track down why TF-M does the deinit first. I believe it has to do with how the logging gets initialized, but it still seems weird.

The changed code in https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/26338 is part of TFM port (not SDK). Also, I have reported the possible infinite loop in the low level driver to the SDK team. But we should not wait for it, as it should be covered by the purposed fix.

dleach02 commented 8 months ago

Please note that CI is also failing build on this:

west build -p -b lpcxpresso55s69_ns tests/lib/cmsis_dsp/distance -T libraries.cmsis_dsp.distance.fpu

Similar error message

butok commented 8 months ago

Please note that CI is also failing build on this:

west build -p -b lpcxpresso55s69_ns tests/lib/cmsis_dsp/distance -T libraries.cmsis_dsp.distance.fpu

Similar error message

It should be solved by https://github.com/zephyrproject-rtos/zephyr/issues/67751#issuecomment-1907889183

dleach02 commented 8 months ago

@d3zd3z, @SebastianBoe, how should we integrate the patch @butok submitted to the TF-M upstream into our fork? This bug is marked as high and we are currently in RC1 of our release flow. Should @butok reproduce this patch here in our fork or will one of you pull an update from TFM upstream?

d3zd3z commented 8 months ago

Cherry pick the patch into our tfm, put a tag with the SoB with a link to the upstream issue/fix, and we can update our version.

d3zd3z commented 8 months ago

Just to be clear here. There is no need to wait for the tf-m upstream to merge or accept the change into our tf-m repo. That's kind of the point of our repo, so that we can fix things we need for our releases. When a future update to tf-m is made, the change will be reverted with the rest, and only applied if it didn't make it into upstream.

So, please make a PR against zephyrproject-rtos/trusted-firmware-m that applies this patch.

dleach02 commented 8 months ago

@butok, to be clear, we do not need to wait on the upstream TF-M PR you submitted to be merged. We need to submit this patch to our Zephyr Project fork now to unblock the release. So please go ahead and submit a PR in Zephyr to fix this.

butok commented 8 months ago

Hi @d3zd3z the PRs were submitted, for your review & merge: https://github.com/zephyrproject-rtos/trusted-firmware-m/pull/103 https://github.com/zephyrproject-rtos/trusted-firmware-m/pull/102