zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.07k stars 6.18k forks source link

aarch32 excn vector not pinned in mmu causing newlib heap overlap #51024

Closed yroeht closed 1 year ago

yroeht commented 1 year ago

Describe the bug

Concerning Zephyr v3.2.0, present since release v3.0.0, introduced by commit 70c403c215 - arch: arm: core: aarch32: introduce basic ARMv7 MMU support

The linker script for aarch32 cortex A and R places z_mapped_start after the exception vector section, just before the code text section. This causes the Newlib malloc_prepare to allocate a heap at the same physical address as the exception vector, because it's page frame is marked by mmu as available rather than pinned. The effect is that when allocated heap memory is written to, this overwrites the exception vector, causing undefined operations.

This does not affect aarch32 cortex M as it does not have MMU.

This does not affect minimal libc.

This does not affect aarch64 as that linker script places z_mapped_start before the exception vector.

This does not affect XIP targets.

I believe this only affects CPU_CORTEX_A9 with Newlib and no XIP.

To Reproduce This is what we did to encounter the problem:

  1. create a soc using CPU_CORTEX_A9 (we used another new ARMv7-A CPU_AARCH32_CORTEX_A which is ISA_THUMB2 but that should be irrelevant)
  2. create a board using that soc, and include in the _defconfig file CONFIG_HEAP_MEM_POOL_SIZE=4096 and CONFIG_NEWLIB_LIBC=y
  3. build using the hello_world sample or any sample for that matter
  4. execute and observe

Expected behavior I think the linker script for aarch32 cortex A should define z_mapped_start before the exception vector

Impact We have currently implemented a workaround by modifying as per above.

Logs and console output None.

Environment (please complete the following information): Debian, Zephyr SDK 0.15 toolchains, tag v3.1.0 2ddd73feaf

Proposed patch

diff --git a/include/zephyr/arch/arm/aarch32/cortex_a_r/scripts/linker.ld b/include/zephyr/arch/arm/aarch32/cortex_a_r/scripts/linker.ld
index 6be9667943..21863a8bd2 100644
--- a/include/zephyr/arch/arm/aarch32/cortex_a_r/scripts/linker.ld
+++ b/include/zephyr/arch/arm/aarch32/cortex_a_r/scripts/linker.ld
@@ -120,6 +120,11 @@ SECTIONS
  * zephyr_linker_sources(ROM_START ...). This typically contains the vector
  * table and debug information.
  */
+#ifndef CONFIG_XIP
+        z_mapped_start = .;
+#endif
 #include <snippets-rom-start.ld>

     } GROUP_LINK_IN(ROMABLE_REGION)
@@ -133,10 +138,6 @@ SECTIONS
     SECTION_PROLOGUE(_TEXT_SECTION_NAME,,)
     {
         . = ALIGN(_region_min_align);
         __text_region_start = .;
-#ifndef CONFIG_XIP
-        z_mapped_start = .;
-#endif

 #include <zephyr/linker/kobject-text.ld>

If you agree with my assessment of this bug, I can open a merge request.

ibirnbaum commented 1 year ago

Interesting as I have actually been building projects for aarch32 Cortex-A with Newlib as the active C library and without XIP. However, my heap mem pool size is significantly larger than 4k. I should get the chance tomorrow to verify the described issue on the Zynq-7000 with the trusty ole' Lauterbach debugger.

yroeht commented 1 year ago

Well actually the code is

max_heap_size = MIN(CONFIG_NEWLIB_LIBC_MAX_MAPPED_REGION_SIZE, k_mem_free_get());

So I take it CONFIG_HEAP_MEM_POOL_SIZE doesn't have any effect.

But we definitely should find out why it works on your side and not on ours.

yroeht commented 1 year ago

I have tried setting CONFIG_NEWLIB_LIBC_MIN_REQUIRED_HEAP_SIZE=1048576 but it still seems to mess with my vector.

FWIW the return value of k_mem_free_get is 1904640

yroeht commented 1 year ago

I have implemented #51035 and have CONFIG_SRAM_BASE_ADDRESS=0x8000000 (because of FastModel limitations on our side which are beyond my understanding at the moment), though it was my understanding this shouldn't matter.

yroeht commented 1 year ago

Interesting as I have actually been building projects for aarch32 Cortex-A with Newlib as the active C library and without XIP. However, my heap mem pool size is significantly larger than 4k. I should get the chance tomorrow to verify the described issue on the Zynq-7000 with the trusty ole' Lauterbach debugger.

Have you been able to look into this yet? I would be interested to know that it works for you.

ibirnbaum commented 1 year ago

When k_mem_map is called, it in turn calls map_anon_page in a loop until the desired amount of memory is mapped.

At the time being, the first call to map_anon_page will actually map the first 4k page whose PA is at the start of the Zephyr image in RAM, where the vectors are located. As those have been copied to address 0 by the time map_anon_page is first called, this re-use hasn't caused any damage so far - but if VBAR is used and the vectors are permanently used at that location, the system will crash. Any subsequent call of map_anon_page will assign memory above the end of the Zephyr image.

Even if this behaviour doesn't cause any damage in the scenario where the vectors are copied to address 0, we're definitely not getting contiguous memory when the first call to k_mem_map requests more than 4k.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.