Closed dcpleung closed 8 months ago
I bisected this down to the tf-m 2.0.0 update #66288 .
@mswarowsky, @d3zd3z
FYI, @butok
On my side, trying to run the TFM psa_protected_storage example using:
$ west build -p always -b lpcxpresso55s69_ns samples/tfm_integration/psa_protected_storage
$ pyocd erase --mass -t LPC55S69
$ pyocd flash build/zephyr/tfm_merged.hex -t LPC55S69
Building is OK. But there is no any message in a console terminal.
I was able to reproduce the error matching the CI reports. Applying the patch at https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/25383 does fix the build issue. I was not able to run the tests though.
Thanks. It should be something else. It's buildable but not functional.
@butok, are you referring to a different issue than what's in this thread?
For me, this is the same issue. TFM is broken after Zephyr TFM update to v2.
I tried https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/25909 but I don't get an output either.
From what I understand there are 2 issues showing here.
COMPILER_CP_FLAG
not being passed to p256m
(now merged in upstream tfm)I tried building samples/hello_world
for lpcxpresso55s69_cpu0
and it works fine.
However, lpcxpresso55s69_cpu1
does not print anything. Could lpcxpresso55s69_cpu1
and lpcxpresso55s69_ns
have the same issue (that wouldn't necessarily be related to tf-m) ?
EDIT: I noticed that the dts for cpu0 and cpu1/ns are a bit different. Eg, the DMA configuration was «fixed» for cpu0 in https://github.com/zephyrproject-rtos/zephyr/pull/59767 but cpu1 and ns were not affected.
cpu1 is not used by TFM, only cpu0.
FYI:
!Attention of TFMv2 integrators is required!
It must be a problem with the tool and/or configuration and/or cmake. This was discovered by analyzing the generated hex files and map files.
The Zephyr-NS is compiled as a non-TFM application (without going to the NS slot). It is located at the beginning of the flash memory. As the result, Zephyr-NS is flashed to the beginning of the flash, then overwitted by BL2 and TFM-S.
@mswarowsky, @d3zd3z can we get some review on this issue?
Narrowing the issue: So the generated NS zephyr.hex starting address is correct 0x3000, but with the generated zephyr_ns_signed.hex the starting address was shifted to the beginning of the flash and the mistake inserted to the final tfm_merged.hex
BTW: There was found another Zephyr TFM issue. Zephyr TFM is signing S and NS images separately. But it should merge S + NS images to one image and only after that to sign it, as it is done in Upstream TFM. Please comment somebody, if this is a mistake or was done intentionally.
@butok using west build -p -b lpcxpresso55s69_ns samples/hello_world/
I get:
file | start | end | comment |
---|---|---|---|
bl2.hex |
0x1000_0000 |
0x1000_6D7F |
last 00 command starting at 6D64 with 12 bytes. |
tfm_s.hex |
0x1000_8400 |
0x1001_91AF |
last 00 command at 9198 with 16 bytes. There's a :020000041001E9 in the file. |
tfm_s_signed.hex |
0x1000_8000 |
0x1001_9551 |
last 00 command at 9550 with 1 byte. There's a :020000041001E9 in the file. |
zephyr.hex |
0x0003_0000 |
0x0003_4E3D |
last 00 command at 4E3A with 4bytes. |
zephyr_ns_signed.hex |
0x0003_0000 |
0x0003_51E7 |
last 00 command at 51E0 with 8 bytes. |
tfm_merged.hex
picks zephyr_ns_signed.hex
, bl2.hex
and tfm_s_signed.hex
in this order at their respective range.
According to the LPC5S6x datasheet, 0x0-0x9_FFFF
and 0x1000_0000-0x1009_FFFF
are respectively the ranges for the non-secure and secure flash memory space.
I haven't checked yet what is defined in the dts files but so far this seems correct to me.
Note: zephyr.hex
uses the command 02
with the parameter 3000
while zephyr_ns_signed.hex
uses the command 04
with the parameter 0003
. Both result in the same address of 0x0003_0000
according to https://en.wikipedia.org/wiki/Intel_HEX#Record_types.
S and NS should not be signed separately. It will not work. Signed NS is not checked. MCUBoot checks only one merged image. TFM S does not check NS image, so the signing chain is not implemented. Upstream TFM+BL2 is merging S+NS to one image, and only after that signs it.
I had a chat with our TF-M team and they pointed to me the parameter MCUBOOT_IMAGE_NUMBER
(CONFIG_TFM_MCUBOOT_IMAGE_NUMBER
in Zephyr) that’s controling the number of image and how they are signed (together or independently).
I see the default is 2, so The current behaviour seem to match that configuration’s default.
However, I tried building with -D CONFIG_TFM_MCUBOOT_IMAGE_NUMBER=1
and the build failed.
I continue searching on that side.
Good to know, missed this parameter. BTW: Interesting fact is that for Upstream TFM, even if MCUBOOT_IMAGE_NUMBER is set to 2, it generates tfm_s_ns_signed.bin, so it generates files for both cases 1 and 2.
Rolled Zephyr commit to 17/01/24 (b8105c1408b28ce44af948572dc445e7ee01a6cc), before TFMv2 update. TFM is built with -DMCUBOOT_IMAGE_NUMBER=2 and it works.
I tracked the failure when forcing …IMAGE_NUMBER
to 1 to https://github.com/zephyrproject-rtos/zephyr/issues/68345
I don't know what's the solution to this.
I am also not sure if that's related in anyway to the issue where no output comes from the device.
I tracked the failure when forcing
…IMAGE_NUMBER
to 1 to #68345 I don't know what's the solution to this.I am also not sure if that's related in anyway to the issue where no output comes from the device.
So MCUBOOT_IMAGE_NUMBER=1 is the common TFM issue. And the older TFM works with MCUBOOT_IMAGE_NUMBER=2, so the new TFM should also work with it.
BTW: Do we have run results for other platforms?
west build -p -b mps2_an521_ns samples/hello_world/ -D CONFIG_TFM_MCUBOOT_IMAGE_NUMBER=2
does run fine.
I get the same build failure with …_IMAGE_NUMBER=1
though.
EDIT: I checked with our TF-M team and AFAIU, platforms are not tested with both.
They would typically only support one of them and most recent platform only support 2
.
So, as the images that don't boot, they seem to be hanging in the function USART_Deinit
:
(gdb) bt
#0 USART_Deinit (base=0x40086000)
at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/platform/ext/target/nxp/common/Native_Driver/drivers/fsl_usart.c:279
#1 0x10002b9a in ARM_USARTx_Deinitialize (uart_dev=0x30000450 <USART_DEV>)
at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c:121
#2 ARM_USART_Uninitialize ()
at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c:189
#3 0x10000818 in do_boot (rsp=0x30000508 <rsp>)
at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/bl2/ext/mcuboot/bl2_main.c:85
#4 main () at /home/davidb/linaro/zep/modules/tee/tf-m/trusted-firmware-m/bl2/ext/mcuboot/bl2_main.c:191
#5 0x1000020e in _start ()
This is very early in startup, and is pointing to maybe an integration problem with the HAL.
It is reading from base->STAT
, expecting certain bits to be set, which never get set.
It's interesting. For me, it is stuck in MCUBoot, bl2_main.c => Line 179, panic error "Unable to find bootable image". At the same time Zephyr MCUBoot (without TFMv2) is fully functional.
It is reading from base->STAT, expecting certain bits to be set, which never get set.
It is set statically, in the trusted-firmware-m/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c, line 164:
static UARTx_Resources USART_DEV = {
.base = USART_BASE,
};
@butok The base address of the uart is set statically indeed, but isn’t base->STAT
a register?
@butok The base address of the uart is set statically indeed, but isn’t
base->STAT
a register?
Yes, base is set by SW, but ->STAT is the USART status register.
It does appear that if I remove the line that is hanging, this doesn't happen. I will continue to try to figure out what is going on here. The code appears to be calling the uart deinit before it inits it.
--- a/platform/ext/target/nxp/common/Native_Driver/drivers/fsl_usart.c
+++ b/platform/ext/target/nxp/common/Native_Driver/drivers/fsl_usart.c
@@ -276,9 +276,11 @@ void USART_Deinit(USART_Type *base)
{
/* Check arguments */
assert(NULL != base);
+ /*
while (0U == (base->STAT & USART_STAT_TXIDLE_MASK))
{
}
+ */
/* Disable interrupts, disable dma requests, disable peripheral */
base->FIFOINTENCLR = USART_FIFOINTENCLR_TXERR_MASK | USART_FIFOINTENCLR_RXERR_MASK | USART_FIFOINTENCLR_TXLVL_MASK |
USART_FIFOINTENCLR_RXLVL_MASK;
So, the usart driver seems to be working, it just is having a problem waiting for the device to become idle before it does the deinit.
Thank you very much @d3zd3z ! The issue is that the USART deinitialization is called without its initialization.
If to add the is_initialized
flag to the TFM CMSIS driver to avoid this case, it works:
--- "a/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c"
+++ "b/platform/ext/target/nxp/common/CMSIS_Driver/Driver_USART.c"
@@ -38,6 +38,7 @@ typedef struct {
usart_config_t config; /* USART configuration structure */
uint32_t tx_nbr_bytes; /* Number of bytes transfered */
uint32_t rx_nbr_bytes; /* Number of bytes recevied */
+ bool is_initialized; /* true if initialized */
} UARTx_Resources;
/* Driver version */
@@ -87,13 +88,15 @@ static ARM_USART_CAPABILITIES ARM_USART_GetCapabilities(void)
static int32_t ARM_USARTx_Initialize(UARTx_Resources* uart_dev)
{
#if (__ARM_FEATURE_CMSE & 0x2) /* Initialize once in S */
uint32_t usartClkFreq;
usartClkFreq = BOARD_DEBUG_UART_CLK_FREQ;
- USART_Init(uart_dev->base, &uart_dev->config, usartClkFreq);
+ if (USART_Init(uart_dev->base, &uart_dev->config, usartClkFreq) == kStatus_Success) {
+ uart_dev->is_initialized = true;
+ }
#endif
return ARM_DRIVER_OK;
@@ -118,7 +121,10 @@ static int32_t ARM_USARTx_PowerControl(UARTx_Resources* uart_dev,
static int32_t ARM_USARTx_Deinitialize(UARTx_Resources* uart_dev)
{
- USART_Deinit(uart_dev->base);
+ if(uart_dev->is_initialized) {
+ USART_Deinit(uart_dev->base);
+ uart_dev->is_initialized = false;
+ }
return ARM_DRIVER_OK;
}
Are you OK with this solution? May I add the PR to the Zephyr-TFM repository to speedup Zephyr fixing? Or do you prefer push it to the Upstream TFM first? I will wait for your decision.
Still there are two open questions:
The PR has been submitted to the Upstream TFM: https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/26338
At least initially, we should fix this here. I'm not sure what the right upstream fix is, since this code comes from the LPC SDK and not directly from TF-M. It may also be better to try and track down why TF-M does the deinit first. I believe it has to do with how the logging gets initialized, but it still seems weird.
At least initially, we should fix this here. I'm not sure what the right upstream fix is, since this code comes from the LPC SDK and not directly from TF-M. It may also be better to try and track down why TF-M does the deinit first. I believe it has to do with how the logging gets initialized, but it still seems weird.
The changed code in https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/26338 is part of TFM port (not SDK). Also, I have reported the possible infinite loop in the low level driver to the SDK team. But we should not wait for it, as it should be covered by the purposed fix.
Please note that CI is also failing build on this:
west build -p -b lpcxpresso55s69_ns tests/lib/cmsis_dsp/distance -T libraries.cmsis_dsp.distance.fpu
Similar error message
Please note that CI is also failing build on this:
west build -p -b lpcxpresso55s69_ns tests/lib/cmsis_dsp/distance -T libraries.cmsis_dsp.distance.fpu
Similar error message
It should be solved by https://github.com/zephyrproject-rtos/zephyr/issues/67751#issuecomment-1907889183
@d3zd3z, @SebastianBoe, how should we integrate the patch @butok submitted to the TF-M upstream into our fork? This bug is marked as high and we are currently in RC1 of our release flow. Should @butok reproduce this patch here in our fork or will one of you pull an update from TFM upstream?
Cherry pick the patch into our tfm, put a tag with the SoB with a link to the upstream issue/fix, and we can update our version.
Just to be clear here. There is no need to wait for the tf-m upstream to merge or accept the change into our tf-m repo. That's kind of the point of our repo, so that we can fix things we need for our releases. When a future update to tf-m is made, the change will be reverted with the rest, and only applied if it didn't make it into upstream.
So, please make a PR against zephyrproject-rtos/trusted-firmware-m that applies this patch.
@butok, to be clear, we do not need to wait on the upstream TF-M PR you submitted to be merged. We need to submit this patch to our Zephyr Project fork now to unblock the release. So please go ahead and submit a PR in Zephyr to fix this.
Hi @d3zd3z the PRs were submitted, for your review & merge: https://github.com/zephyrproject-rtos/trusted-firmware-m/pull/103 https://github.com/zephyrproject-rtos/trusted-firmware-m/pull/102
Describe the bug This was discovered in CI where
lpcxpresso55s69_ns
failed to build ontests/arch/common/timing
:If
CONFIG_FPU
is disabled, this builds without any errors. So my guess is that FPU related compiler flags are not consistently used throughout the build.To Reproduce Steps to reproduce the behavior:
Expected behavior Should build without any errors.
Impact CI failing on PRs.
Environment (please complete the following information):