Closed gmarull closed 9 months ago
The problem is that the BFD program header rewrite implementation makes a questionable assumption that the LMA of a segment will be aligned to the alignment of the first section in the segment:
/* If the first section in a segment does not start at
the beginning of the segment, then something is
wrong. */
if (align_power (map->p_paddr
+ (map->includes_filehdr
? iehdr->e_ehsize : 0)
+ (map->includes_phdrs
? iehdr->e_phnum * iehdr->e_phentsize
: 0),
output_section->alignment_power * opb)
!= output_section->lma * opb)
goto sorry;
This is not a problem when the LMA is equal to the VMA and both are aligned at the section alignment; but, for the sections that have different VMA and LMA, only the VMA is guaranteed to be aligned at the section alignment and the LMA may or may not be aligned at the same boundary.
Now looking at the cases with next_event_cyc = new_next_event_cyc;
(bad) and without (good).
Good:
Program Header:
[...]
LOAD off 0x00005044 vaddr 0x20000000 paddr 0x00010f64 align 2**2
filesz 0x000000b8 memsz 0x000000b8 flags rw-
[...]
Sections:
Idx Name Size VMA LMA File off Algn
[...]
10 datas 000000b0 20000000 00010f64 00005044 2**2
CONTENTS, ALLOC, LOAD, DATA
[...]
Bad:
Program Header:
[...]
LOAD off 0x00005098 vaddr 0x20000000 paddr 0x00010fb4 align 2**3
filesz 0x000000c0 memsz 0x000000c0 flags rw-
[...]
Sections:
Idx Name Size VMA LMA File off Algn
[...]
10 datas 000000b8 20000000 00010fb4 00005098 2**3
CONTENTS, ALLOC, LOAD, DATA
[...]
(the full dump can be found in https://gist.github.com/stephanosio/5946bd6f8f249b5329fb4982f87cd58e)
Note that, for both good and bad, the datas
section is the first section in a segment.
For good, the LMA is 0x10f64 which is aligned at 2**2
.
For bad, the LMA is 0x10fb4 which is NOT aligned at 2**3
.
As seen above, the BFD implementation expects the LMA of a segment to be aligned at the alignment of the first section of the segment, and attempts to align the address of the segment LMA to the alignment of the first section before comparing it to the actual LMA of the first section for the purpose of verifying that the first section starts at the beginning of the segment.
When running objcopy --change-section-lma *+483328 ...
, which offsets the LMA by 0x76000:
In case of good, since the segment LMA is divisible by 4 (2**2
), this results in 0x86f64 == 0x86f64
, which evaluates to true.
In case of bad, since the segment LMA is NOT divisible by 8 (2**3
), this results in 0x86fb8 == 0x86fb4
, which evaluates to false.
Since the align_power()
is applied on the segment LMA mainly for the purpose of handling the alignment in case there are file or program headers in the segment (the expectation is that, if there are file and program headers in a segment, then the linker would have respected the section alignment for the first section), adding an alternate check without the segment LMA alignment for the header-less case would make sense to support the VMA != LMA case.
diff --git a/bfd/elf.c b/bfd/elf.c
index 79f71aa81e2..a0b65c4fd3b 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -7412,7 +7412,7 @@ rewrite_elf_program_header (bfd *ibfd, bfd *obfd, bfd_vma maxpagesize)
/* If the first section in a segment does not start at
the beginning of the segment, then something is
wrong. */
- if (align_power (map->p_paddr
+ if ((align_power (map->p_paddr
+ (map->includes_filehdr
? iehdr->e_ehsize : 0)
+ (map->includes_phdrs
@@ -7420,6 +7420,7 @@ rewrite_elf_program_header (bfd *ibfd, bfd *obfd, bfd_vma maxpagesize)
: 0),
output_section->alignment_power * opb)
!= output_section->lma * opb)
+ && (map->p_paddr != output_section->lma * opb))
goto sorry;
}
else
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
The dev branch in PR #60400 hits the same bug (see https://github.com/zephyrproject-rtos/zephyr/actions/runs/5861777078/job/15892496313?pr=60400), this time the offender is tests/boot/test_mcuboot/boot.mcuboot.assert which is not yet skipped. (Same root cause, though - it's the same little app that triggers the issue.)
@stephanosio Should that one be skipped as well?
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Why do bugs get the stale label? I guess many problems get forgotten and lost just because of this bot action. @carlescufi @stephanosio
Why do bugs get the stale label? I guess many problems get forgotten and lost just because of this bot action. @carlescufi @stephanosio
To remind the stakeholders the existence of the bug and keep the bug count under control by closing any bugs that have no stakeholders -- the idea is that, if there are any stakeholders, they would remove the "stale" label, as seen above.
@stephanosio @danieldegrasse I believe that #60934 should be reverted ASAP or at least together with a root cause solution to this issue as proposed by @stephanosio . #60934 is a workaround plus it doesn't even fix the alignment issue if the MPU aligment is smaller than the concerned section's VMA alignment. I see failing tests with #60934 applied, with the required section alignment being 2**3 while the MPU alignment is 2**2.
Do you agree?
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
@stephanosio any plans to get this upstreamed into gcc?
@stephanosio any plans to get this upstreamed into gcc?
I will release an SDK RC with the proposed patch above next week.
Closing this since Zephyr SDK 0.16.5 with the binutils fix for this issue has been mainlined.
@stephanosio @danieldegrasse I believe that https://github.com/zephyrproject-rtos/zephyr/pull/60934 should be reverted ASAP or at least together with a root cause solution to this issue as proposed by @stephanosio . https://github.com/zephyrproject-rtos/zephyr/pull/60934 is a workaround plus it doesn't even fix the alignment issue if the MPU aligment is smaller than the concerned section's VMA alignment. I see failing tests with https://github.com/zephyrproject-rtos/zephyr/pull/60934 applied, with the required section alignment being 23 while the MPU alignment is 22.
Do you agree?
@fgrandel would you like to open an issue to get #60934 reverted? If you can point to a specific bug the commit causes, we might be able to make the change prior to the release
@fgrandel would you like to open an issue to get #60934 reverted? If you can point to a specific bug the commit causes, we might be able to make the change prior to the release
@danieldegrasse Sorry, saw your comment only now, revisiting this issue. I seems to me that the additional Kconfig switch is redundant in the current mainline now that the underlying root cause was fixed. It might cause bugs in the future when the condition described in my original message applies. I might be mistaken, though, as I might misunderstand the intent of the CONFIG_BUILD_ALIGN_LMA switch in the first place.
@fgrandel would you like to open an issue to get #60934 reverted? If you can point to a specific bug the commit causes, we might be able to make the change prior to the release
@danieldegrasse Sorry, saw your comment only now, revisiting this issue. I seems to me that the additional Kconfig switch is redundant in the current mainline now that the underlying root cause was fixed. It might cause bugs in the future when the condition described in my original message applies. I might be mistaken, though, as I might misunderstand the intent of the CONFIG_BUILD_ALIGN_LMA switch in the first place.
No worries, you're understanding the purpose of CONFIG_BUILD_ALIGN_LMA
exactly. https://github.com/zephyrproject-rtos/zephyr/pull/72376 should revert the commit adding it, so this workaround will no longer be in tree
Describe the bug
I discovered a heisenbug while adding this new feature to Zephyr: https://github.com/zephyrproject-rtos/zephyr/pull/57229. Some failures with objcopy occur when using
CONFIG_BUILD_OUTPUT_ADJUST_LMA
, e.g. when compiling some MCUboot tests.To Reproduce
Use https://github.com/zephyrproject-rtos/zephyr/pull/57229 branch and build:
If the following patch is applied to the branch, the problem disappears:
Expected behavior Build succeeds
Impact Blocker
Logs and console output
Environment (please complete the following information):
Additional context N/A