lexus2k / tinyproto

Tiny Software Protocol for communication over UART, SPI, etc
GNU General Public License v3.0
240 stars 51 forks source link

Wrong alignement for function pointers (callbacks) causes CPU fault on Coretex-M4 #17

Closed r2r0 closed 3 years ago

r2r0 commented 3 years ago

I got usage fault beacuse of unaligned memory on STM32WB55 (Cortex-M4) when accessing structure fields holding function pointers (callbacks) i.e. function hdlc_ll_reset is failing at line: handle->rx.state = hdlc_ll_read_start;.

For confirmation I added (dirty hack) alignment to memory pointers used to initialize buf field in hdlc_ll_init_t and such change allows code to execute properly: TinyProtocolFd.h:301

uint8_t m_data[S] __attribute__ ((aligned (8))) {};

TinyProtocolFd.h:317

    : IFd(new uint8_t[size+7], size+7)

tiny_fd.c:584:

uint32_t ptrValue = (uint32_t)ptr;
ptrValue = ((ptrValue + 7U) / 8U) * 8U;
_init.buf = (void*)ptrValue;

Compiler GCC 10.2.0

lexus2k commented 3 years ago

Hi,

Thank you for reporting the issues. I will investigate that. By hand I have the board with Cortex-M0 (also 32-bit) and it doesn't have such issue. Need some time to figure out the root cause.

PS. But why 8 and not 4 ?

lexus2k commented 3 years ago

What if to play with CCR.STKALIGN=1 in your code?

Is it the issue with C++ classes only? What about C-style API?

Just checked the code on Cortex-A53, no issues with alignment.

Another idea is to add __attribute__((__packed__)) to the structures declarations

r2r0 commented 3 years ago

Thanks for quick reply.

I use Zephyr RTOS and it has already set 8 bytes stack alignment (because MPU is used for stack overflow protection). I think that stack alignment should have no influence on problem because I also used FdD for testing (with the same effect) where all data is allocated on heap.

I did not try C API yet.

I also tried packed structures and it works but it makes access to their fields quite expensive (comparing to unpacked/aligned structures).

IMHO if code is doing pointer arithemetic and then code uses such pointer as an addres of structure (i.e. _init.buf assignment) then it is required that such (calulated) address is properly aligned.

Unfortunately in 99.99% cases you will get proper results but i.e. compiler version change can trigger unaligned access. I also tried to run my test on QEMU for Cortex-M3 and there was no problem, so for me it means that problem is triggered by some very minor change in compiler options.

lexus2k commented 3 years ago

I also tried to run my test on QEMU for Cortex-M3 and there was no problem, so for me it means that problem is triggered by some very minor change in compiler options.

Can you capture the compiler options in both cases?

IMHO if code is doing pointer arithemetic and then code uses such pointer as an addres of structure

Almost all code in C / C++ does pointer arithmetic, and uses such pointers. The raw pointers are the basic of C. And doing manual alignment every time you use the pointer would be real headache for all developers. But that's not true. I agree, that sometimes to speed up a program execution used structures must be aligned in memory, but UsageFault is something specific to the platform, and we need to find the root cause for that to properly fix the issue. The alignment to 8 bytes will not be good solution for other platforms.

Of course, it is possible to make quick fix with #ifdef to support your case.

PS. On ARM website, I have found 2 compiler options --unaligned_access, --no_unaligned_access The documentation for STM32W55 points out that this CPU has Cortex-M4 core and according to ARM website Cortex-M4 has Armv7E-M architecture. The ARM documentation on the compiler options says that Armv7E-M supports unaligned access.

PPS. And I'm thinking if aligned_alloc instead of new can help (: IFd(new uint8_t[size+7], size+7))

PPPS.

uintptr_t m_data[(S + sizeof(uintptr_t ) - 1) / sizeof(uintptr_t )]
r2r0 commented 3 years ago

You are right - STM32WB55 is obviously Cortex-M4. I am sorry for messing this - I was just after build for QEMU Cortex-M3 and I had compilation output on screen when writing issue description.

I attached compiler command line for both platforms. Except difference in mcu option the interesting part is that _FORTIFY_SOURCE=2 is used only for Cortex-M4.

I also found something similiar problem described on stackoverflow: https://stackoverflow.com/questions/59592724/why-am-i-getting-an-unaligned-memory-access-fault-cortex-m4

I did 8 bytes alignment in example because I also tried it on x86_64 and there is 8 bytes alignement for some pointers. So your idea to use sizeof(uintptr_t) as alignment value seems to be very good.

At least on my platform aligned_alloc is problematic (linker error: undefined reference to `posix_memalign'). If you wrap aligned allocation in some HAL/platform wrapper then it can be very good solution (i.e. in Zephyr I have k_aligned_alloc).

compilation_options.txt

lexus2k commented 3 years ago

I added some alignment implementation on the master branch. If you have a time to test it, it would be very helpful.

Zzzwierzak commented 3 years ago

Hi, I've tested revision 46993c8, issue is still visible. What I noticed is that calling setWindowSize function with argument different than 4 will cause HF. Disassembly shows that HF is caused by STRD instruction. Pointer ptr value that is problematic came from calculation in tiny_fd_init() before call of hdlc_ll_init().

lexus2k commented 3 years ago

Hi,

Ok, what we have:

  1. According to previos posts packed structures with alignment 1 byte work for you, but they shouldn't according to ARM documentation:

I also tried packed structures and it works but it makes access to their fields quite expensive (comparing to unpacked/aligned structures).

And here is the link to documntation: https://developer.arm.com/documentation/100748/0609/writing-optimized-code/packing-data-structures

If a member of a structure or union is packed and therefore does not have its natural alignment, then to access this member, you must use the structure or union that contains this member. You must not take the address of such a packed member to use as a pointer, because the pointer might be unaligned. Dereferencing such a pointer can be unsafe even when unaligned accesses are supported by the target, because certain instructions always require word-aligned addresses.

  1. @r2r0 mentioned that 8 bytes alignment works for you:

uint8_t m_data[S] attribute ((aligned (8))) {}; ... : IFd(new uint8_t[size+7], size+7) ... uint32_t ptrValue = (uint32_t)ptr; ptrValue = ((ptrValue + 7U) / 8U) 8U; _init.buf = (void)ptrValue;

  1. You mentioned the problem with setWindowSize() function.

What I noticed is that calling setWindowSize function with argument different than 4 will cause HF. Disassembly shows that HF is caused by STRD instruction

But it does almost nothing, just sets member field of the class. How it can fail?

    void setWindowSize(uint8_t window)
    {
        m_window = window;
    }
  1. At least with the latest commits I added alignment for raw buffer allocation. so the structure tiny_fd_data_t is always aligned.

Currently alignment is set to 4 bytes for Cortex-M4:

#if defined(__TARGET_CPU_CORTEX_M0) || defined(__TARGET_CPU_CORTEX_M0_) || defined(__ARM_ARCH_6M__) || \
    defined(__TARGET_CPU_CORTEX_M3) || defined(__TARGET_CPU_CORTEX_M4) || defined(__ARM_ARCH_7EM__) || \
    defined(__ARM_ARCH_7M__)

#define TINY_ALIGNED_PTR   TINY_ALIGNED(sizeof(uintptr_t))

#else

#define TINY_ALIGNED_PTR   TINY_ALIGNED(sizeof(uintptr_t))

#endif

But you can change that to 8 bytes by replacing sizeof(uintptr_t) with 8, or even you can try to use alignment 16.

  1. To understand how the code works in your case, I need some simple example from you with the protocol initialization and failing functions.
Zzzwierzak commented 3 years ago

It seems that last commit fixed issue, at least I'm not able to reproduce it. Let's wait for final confirmation from @r2r0

lexus2k commented 3 years ago

HI @Zzzwierzak

That's nice to hear that. I added alignment checks for the provided buffer to hdlc and fd initialization code. However, from the API standpoint maybe a better way is to remove so strict requirements for the user buffer, and automatically allocate a aligned space for the structures inside the buffer, as @r2r0 mentioned in initial post. Let me know what you think.

r2r0 commented 3 years ago

Hi @lexus2k

Thanks for your support - present version works well in our application.

In fact alignment requirements for user supplied bufer can be sometimes problematic.

Probably the definition of TINY_ALIGN_STRUCT_VALUE can be unified to sizeof(uintptr_t) - I used in my sample hardcoded value of 8 only for quick test on x86 platform but sizeof(uintptr_t) seems to be much more elegant solution.

I wonder if allocation in FdD::FdD should also use TINY_ALIGN_STRUCT_VALUE (istead of sizeof(uintptr_t)).

lexus2k commented 3 years ago

HI

I wonder if allocation in FdD::FdD should also use TINY_ALIGN_STRUCT_VALUE (istead of sizeof(uintptr_t)).

You're absolutely correct. I've just fixed that in last commit.

In fact alignment requirements for user supplied bufer can be sometimes problematic.

Ok, I will remove strict verififcations from HDLC and FD.

Probably the definition of TINY_ALIGN_STRUCT_VALUE can be unified to sizeof(uintptr_t) - I used in my sample hardcoded value of 8 only for quick test on x86 platform but sizeof(uintptr_t) seems to be much more elegant solution.

Can you confirm that sizeof(uintptr_t) works for you without any issues? If yes, I will update the header file.

r2r0 commented 3 years ago

Yes, I confirm that sizeof(uintptr_t) works for me.

lexus2k commented 3 years ago

@r2r0 Thank you very much.

lexus2k commented 3 years ago

I'm closing the issue for now, as the problem is considered to be fixed. If you have any questions and notes, please let me know.