zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.83k stars 6.6k forks source link

Unaligned memory access by ldrd #15605

Closed bublover closed 5 years ago

bublover commented 5 years ago

Describe the bug Running zperf test, I always found unaligned memory access fault with CONFIG_SPEED_OPTIMIZATIONS=y.

To Reproduce Steps to reproduce the behavior:

  1. Input command in shell.
    zperf udp download
  2. Input command in PC.
    iperf -l 1K -c 192.168.1.105 -p 5001 -i 1 -u
  3. See error

Error from this line 0x2015f38 <udp_received+192> ldrd r3, r2, [r8, #4]
r8 is not 4-byte aligned.

   │0x2015f14 <udp_received+156>    bl     0x202aed8 <net_pkt_remaining_data>                                                                                                                              │
   │0x2015f18 <udp_received+160>    ldrd   r2, r3, [r5, #40]       ; 0x28                                                                                                                                  │
   │0x2015f1c <udp_received+164>    adds   r6, r2, r0                                                                                                                                                      │
   │0x2015f1e <udp_received+166>    adc.w  r7, r3, #0                                                                                                                                                      │
   │0x2015f22 <udp_received+170>    mov    r0, r6                                                                                                                                                          │
   │0x2015f24 <udp_received+172>    mov    r1, r7                                                                                                                                                          │
   │0x2015f26 <udp_received+174>    strd   r0, r1, [r5, #40]       ; 0x28                                                                                                                                  │
   │0x2015f2a <udp_received+178>    add    r3, pc, #548    ; (adr r3, 0x2016150 <udp_received+728>)                                                                                                        │
   │0x2015f2c <udp_received+180>    ldrd   r2, r3, [r3]                                                                                                                                                    │
   │0x2015f30 <udp_received+184>    umull  r0, r1, r10, r9                                                                                                                                                 │
   │0x2015f34 <udp_received+188>    bl     0x20120e0 <__aeabi_uldivmod>                                                                                                                                    │
  >│0x2015f38 <udp_received+192>    ldrd   r3, r2, [r8, #4]                                                                                                                                                │
   │0x2015f3c <udp_received+196>    rev    r2, r2                                                                                                                                                          │
   │0x2015f3e <udp_received+198>    rev    r3, r3                                                                                                                                                          │
   │0x2015f40 <udp_received+200>    mla    r3, r9, r3, r2                                                                                                                                                  │
   │0x2015f44 <udp_received+204>    cmp    r0, r3      
r0             0x270324            2556708
r1             0x0                 0
r2             0xa7d8c00           176000000
r3             0x0                 0
r4             0x113888            1128584
r5             0x1012e8            1053416
r6             0x400               1024
r7             0x0                 0
r8             0x11438a            1131402
r9             0xf4240             1000000
r10            0x3f651b30          1063590704
r11            0x30                48
r12            0x3                 3
sp             0x10e208            0x10e208 <rx_stack+1152>
lr             0x2015f39           33644345
pc             0x2015f38           0x2015f38 <udp_received+192>

C code:

             /* Compute jitter */                                                                                                                                                                        
             transit_time = time_delta(HW_CYCLES_TO_USEC(time),
                           ntohl(hdr->tv_sec) *
                           USEC_PER_SEC +
                           ntohl(hdr->tv_usec));
***** USAGE FAULT *****
  Unaligned memory access
***** Hardware exception *****
Current thread ID = 0x0010826c
Faulting instruction address = 0x2015f38
Fatal fault in thread 0x0010826c! Aborting.

Expected behavior r8 is 4-byte aligned.

Impact Function not working.

Environment (please complete the following information):

Additional context If CONFIG_SPEED_OPTIMIZATIONS is disabled, nothing can be found.

bublover commented 5 years ago

"If you are creating misaligned pointers to structs or double words, then you must use the __packed qualifier in order to ensure that the compiler does not use LDRD and LDM instructions,"

Reference: https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/7179/hard-fault-in-cortex-m4

yashi commented 5 years ago

Are we sure we want to add __packed here? The UDP header itself is 4 bytes aligned (Ethernet frame isn't). We also have slab allocator to align the packet buffer. So, can we just align the packet buffer, if not?

If we need __packed here, don't we need every other field as well?

LDRD just need word-aligned. Can you share the generated instructions with __packed? I assume __packed tells compilers to generate byte access instructions.

bublover commented 5 years ago

Are we sure we want to add __packed here? The UDP header itself is 4 bytes aligned (Ethernet frame isn't). We also have slab allocator to align the packet buffer. So, can we just align the packet buffer, if not?

If we need __packed here, don't we need every other field as well?

LDRD just need word-aligned. Can you share the generated instructions with __packed? I assume __packed tells compilers to generate byte access instructions.

Sorry for confusing. "If you are creating misaligned pointers to structs or double words, then you must use the __packed qualifier in order to ensure that the compiler does not use LDRD and LDM instructions,"

https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/7179/hard-fault-in-cortex-m4

__packed will tell comipler not use double word instruction. In this case, ldr is proper. I did not save the zephyr.list.

We also have slab allocator to align the packet buffer.

@jukkar Could you please help make a comment on this?

jukkar commented 5 years ago

Please note that the zperf_udp_datagram is just cast to arbitrary memory location of the received net_buf. It might not be aligned to word boundary.