zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.88k stars 6.63k forks source link

Build issues observed when using CONFIG_BUILD_OUTPUT_ADJUST_LMA #58080

Closed gmarull closed 9 months ago

gmarull commented 1 year ago

Describe the bug

I discovered a heisenbug while adding this new feature to Zephyr: https://github.com/zephyrproject-rtos/zephyr/pull/57229. Some failures with objcopy occur when using CONFIG_BUILD_OUTPUT_ADJUST_LMA, e.g. when compiling some MCUboot tests.

To Reproduce

Use https://github.com/zephyrproject-rtos/zephyr/pull/57229 branch and build:

west build -p -b nrf52840dk_nrf52840 -T tests/boot/test_mcuboot/boot.mcuboot --sysbuild

If the following patch is applied to the branch, the problem disappears:

diff --git a/subsys/pm/policy.c b/subsys/pm/policy.c
index 3b4132c335..31eddf2d28 100644
--- a/subsys/pm/policy.c
+++ b/subsys/pm/policy.c
@@ -122,7 +122,7 @@ static void update_next_event(uint32_t cyc)
                new_next_event_cyc -= UINT32_MAX + 1U;
        }

-       next_event_cyc = new_next_event_cyc;
+       /* next_event_cyc = new_next_event_cyc; */
 }

 #ifdef CONFIG_PM_POLICY_DEFAULT

Expected behavior Build succeeds

Impact Blocker

Logs and console output

FAILED: zephyr/zephyr.elf zephyr/zephyr.map zephyr/zephyr.hex zephyr/zephyr.bin zephyr/zephyr.lst zephyr/zephyr.stat zephyr/zephyr.signed.bin zephyr/zephyr.signed.hex /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.map /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.hex /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.bin /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.lst /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.stat /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.signed.bin /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.signed.hex 
: && /home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-gcc  -gdwarf-4 zephyr/CMakeFiles/zephyr_final.dir/misc/empty_file.c.obj zephyr/CMakeFiles/zephyr_final.dir/dev_handles.c.obj zephyr/CMakeFiles/zephyr_final.dir/isr_tables.c.obj -o zephyr/zephyr.elf  zephyr/CMakeFiles/offsets.dir/./arch/arm/core/offsets/offsets.c.obj  -fuse-ld=bfd  -Wl,-T  zephyr/linker.cmd  -Wl,-Map=/home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr_final.map  -Wl,--whole-archive  app/libapp.a  zephyr/libzephyr.a  zephyr/arch/common/libarch__common.a  zephyr/arch/arch/arm/core/aarch32/libarch__arm__core__aarch32.a  zephyr/arch/arch/arm/core/aarch32/cortex_m/libarch__arm__core__aarch32__cortex_m.a  zephyr/arch/arch/arm/core/aarch32/mpu/libarch__arm__core__aarch32__mpu.a  zephyr/lib/libc/minimal/liblib__libc__minimal.a  zephyr/lib/libc/common/liblib__libc__common.a  zephyr/soc/arm/common/cortex_m/libsoc__arm__common__cortex_m.a  zephyr/soc/arm/nordic_nrf/nrf52/libsoc__arm__nordic_nrf__nrf52.a  zephyr/drivers/clock_control/libdrivers__clock_control.a  zephyr/drivers/console/libdrivers__console.a  zephyr/drivers/gpio/libdrivers__gpio.a  zephyr/drivers/pinctrl/libdrivers__pinctrl.a  zephyr/drivers/serial/libdrivers__serial.a  zephyr/drivers/timer/libdrivers__timer.a  modules/hal_nordic/nrfx/libmodules__hal_nordic__nrfx.a  modules/segger/libmodules__segger.a  -Wl,--no-whole-archive  zephyr/kernel/libkernel.a  -L"/home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.1.0/thumb/v7e-m/nofp"  -L/home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr  -lgcc  -Wl,--print-memory-usage  zephyr/arch/common/libisr_tables.a  -mcpu=cortex-m4  -mthumb  -mabi=aapcs  -Wl,--gc-sections  -Wl,--build-id=none  -Wl,--sort-common=descending  -Wl,--sort-section=alignment  -Wl,-u,_OffsetAbsSyms  -Wl,-u,_ConfigAbsSyms  -nostdlib  -static  -Wl,-X  -Wl,-N  -Wl,--orphan-handling=warn  -Wl,-no-pie && cd /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr && /usr/bin/cmake -E copy zephyr_final.map zephyr.map && /home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-objcopy --change-section-lma *+483328 zephyr.elf zephyr.elf && /home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-objcopy --gap-fill 0xff --output-target=ihex --remove-section=.comment --remove-section=COMMON --remove-section=.eh_frame zephyr.elf zephyr.hex && /home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-objcopy --gap-fill 0xff --output-target=binary --remove-section=.comment --remove-section=COMMON --remove-section=.eh_frame zephyr.elf zephyr.bin && /home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-objdump -d -S zephyr.elf > zephyr.lst && /home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-readelf -e zephyr.elf > zephyr.stat && /home/gmarull/ws/nordic/zephyrproject/.venv/bin/python3.11 -m west sign --quiet --tool imgtool --tool-path /home/gmarull/ws/nordic/zephyrproject/.venv/bin/imgtool --build-dir /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app --bin --sbin /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.signed.bin --hex --shex /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app/zephyr/zephyr.signed.hex -- --key /home/gmarull/ws/nordic/zephyrproject/bootloader/mcuboot/root-rsa-2048.pem
Memory region         Used Size  Region Size  %age Used
           FLASH:       20600 B       472 KB      4.26%
             RAM:        5568 B       256 KB      2.12%
        IDT_LIST:          0 GB         2 KB      0.00%
/home/gmarull/zephyr-sdk-0.15.1/arm-zephyr-eabi/bin/arm-zephyr-eabi-objcopy: zephyr.elf: error in private header data: sorry, cannot handle this file
ninja: build stopped: subcommand failed.
[3/9] Completed 'test_mcuboot'
FAILED: swapped_app-prefix/src/swapped_app-stamp/swapped_app-build /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app-prefix/src/swapped_app-stamp/swapped_app-build 
cd /home/gmarull/ws/nordic/zephyrproject/zephyr/build/swapped_app && /usr/bin/cmake --build .
ninja: build stopped: subcommand failed.

Environment (please complete the following information):

Additional context N/A

stephanosio commented 1 year ago

The problem is that the BFD program header rewrite implementation makes a questionable assumption that the LMA of a segment will be aligned to the alignment of the first section in the segment:

              /* If the first section in a segment does not start at
             the beginning of the segment, then something is
             wrong.  */
              if (align_power (map->p_paddr
                       + (map->includes_filehdr
                      ? iehdr->e_ehsize : 0)
                       + (map->includes_phdrs
                      ? iehdr->e_phnum * iehdr->e_phentsize
                      : 0),
                       output_section->alignment_power * opb)
              != output_section->lma * opb)
            goto sorry;

This is not a problem when the LMA is equal to the VMA and both are aligned at the section alignment; but, for the sections that have different VMA and LMA, only the VMA is guaranteed to be aligned at the section alignment and the LMA may or may not be aligned at the same boundary.

Now looking at the cases with next_event_cyc = new_next_event_cyc; (bad) and without (good).

Good:

Program Header:
[...]
    LOAD off    0x00005044 vaddr 0x20000000 paddr 0x00010f64 align 2**2
         filesz 0x000000b8 memsz 0x000000b8 flags rw-
[...]
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
[...]
 10 datas         000000b0  20000000  00010f64  00005044  2**2
                  CONTENTS, ALLOC, LOAD, DATA
[...]

Bad:

Program Header:
[...]
    LOAD off    0x00005098 vaddr 0x20000000 paddr 0x00010fb4 align 2**3
         filesz 0x000000c0 memsz 0x000000c0 flags rw-
[...]
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
[...]
10 datas         000000b8  20000000  00010fb4  00005098  2**3
                  CONTENTS, ALLOC, LOAD, DATA
[...]

(the full dump can be found in https://gist.github.com/stephanosio/5946bd6f8f249b5329fb4982f87cd58e)

Note that, for both good and bad, the datas section is the first section in a segment.

For good, the LMA is 0x10f64 which is aligned at 2**2. For bad, the LMA is 0x10fb4 which is NOT aligned at 2**3.

As seen above, the BFD implementation expects the LMA of a segment to be aligned at the alignment of the first section of the segment, and attempts to align the address of the segment LMA to the alignment of the first section before comparing it to the actual LMA of the first section for the purpose of verifying that the first section starts at the beginning of the segment.

When running objcopy --change-section-lma *+483328 ..., which offsets the LMA by 0x76000:

In case of good, since the segment LMA is divisible by 4 (2**2), this results in 0x86f64 == 0x86f64, which evaluates to true. In case of bad, since the segment LMA is NOT divisible by 8 (2**3), this results in 0x86fb8 == 0x86fb4, which evaluates to false.

Since the align_power() is applied on the segment LMA mainly for the purpose of handling the alignment in case there are file or program headers in the segment (the expectation is that, if there are file and program headers in a segment, then the linker would have respected the section alignment for the first section), adding an alternate check without the segment LMA alignment for the header-less case would make sense to support the VMA != LMA case.

diff --git a/bfd/elf.c b/bfd/elf.c
index 79f71aa81e2..a0b65c4fd3b 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -7412,7 +7412,7 @@ rewrite_elf_program_header (bfd *ibfd, bfd *obfd, bfd_vma maxpagesize)
              /* If the first section in a segment does not start at
             the beginning of the segment, then something is
             wrong.  */
-             if (align_power (map->p_paddr
+             if ((align_power (map->p_paddr
                       + (map->includes_filehdr
                      ? iehdr->e_ehsize : 0)
                       + (map->includes_phdrs
@@ -7420,6 +7420,7 @@ rewrite_elf_program_header (bfd *ibfd, bfd *obfd, bfd_vma maxpagesize)
                      : 0),
                       output_section->alignment_power * opb)
              != output_section->lma * opb)
+             && (map->p_paddr != output_section->lma * opb))
            goto sorry;
            }
          else
github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

ghost commented 1 year ago

The dev branch in PR #60400 hits the same bug (see https://github.com/zephyrproject-rtos/zephyr/actions/runs/5861777078/job/15892496313?pr=60400), this time the offender is tests/boot/test_mcuboot/boot.mcuboot.assert which is not yet skipped. (Same root cause, though - it's the same little app that triggers the issue.)

@stephanosio Should that one be skipped as well?

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

gmarull commented 1 year ago

Why do bugs get the stale label? I guess many problems get forgotten and lost just because of this bot action. @carlescufi @stephanosio

stephanosio commented 1 year ago

Why do bugs get the stale label? I guess many problems get forgotten and lost just because of this bot action. @carlescufi @stephanosio

To remind the stakeholders the existence of the bug and keep the bug count under control by closing any bugs that have no stakeholders -- the idea is that, if there are any stakeholders, they would remove the "stale" label, as seen above.

ghost commented 1 year ago

@stephanosio @danieldegrasse I believe that #60934 should be reverted ASAP or at least together with a root cause solution to this issue as proposed by @stephanosio . #60934 is a workaround plus it doesn't even fix the alignment issue if the MPU aligment is smaller than the concerned section's VMA alignment. I see failing tests with #60934 applied, with the required section alignment being 2**3 while the MPU alignment is 2**2.

Do you agree?

github-actions[bot] commented 11 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

nordicjm commented 10 months ago

@stephanosio any plans to get this upstreamed into gcc?

stephanosio commented 10 months ago

@stephanosio any plans to get this upstreamed into gcc?

I will release an SDK RC with the proposed patch above next week.

stephanosio commented 9 months ago

Closing this since Zephyr SDK 0.16.5 with the binutils fix for this issue has been mainlined.

danieldegrasse commented 9 months ago

@stephanosio @danieldegrasse I believe that https://github.com/zephyrproject-rtos/zephyr/pull/60934 should be reverted ASAP or at least together with a root cause solution to this issue as proposed by @stephanosio . https://github.com/zephyrproject-rtos/zephyr/pull/60934 is a workaround plus it doesn't even fix the alignment issue if the MPU aligment is smaller than the concerned section's VMA alignment. I see failing tests with https://github.com/zephyrproject-rtos/zephyr/pull/60934 applied, with the required section alignment being 23 while the MPU alignment is 22.

Do you agree?

@fgrandel would you like to open an issue to get #60934 reverted? If you can point to a specific bug the commit causes, we might be able to make the change prior to the release

ghost commented 6 months ago

@fgrandel would you like to open an issue to get #60934 reverted? If you can point to a specific bug the commit causes, we might be able to make the change prior to the release

@danieldegrasse Sorry, saw your comment only now, revisiting this issue. I seems to me that the additional Kconfig switch is redundant in the current mainline now that the underlying root cause was fixed. It might cause bugs in the future when the condition described in my original message applies. I might be mistaken, though, as I might misunderstand the intent of the CONFIG_BUILD_ALIGN_LMA switch in the first place.

danieldegrasse commented 6 months ago

@fgrandel would you like to open an issue to get #60934 reverted? If you can point to a specific bug the commit causes, we might be able to make the change prior to the release

@danieldegrasse Sorry, saw your comment only now, revisiting this issue. I seems to me that the additional Kconfig switch is redundant in the current mainline now that the underlying root cause was fixed. It might cause bugs in the future when the condition described in my original message applies. I might be mistaken, though, as I might misunderstand the intent of the CONFIG_BUILD_ALIGN_LMA switch in the first place.

No worries, you're understanding the purpose of CONFIG_BUILD_ALIGN_LMA exactly. https://github.com/zephyrproject-rtos/zephyr/pull/72376 should revert the commit adding it, so this workaround will no longer be in tree