Closed r2r0 closed 3 years ago
Hi,
Thank you for reporting the issues. I will investigate that. By hand I have the board with Cortex-M0 (also 32-bit) and it doesn't have such issue. Need some time to figure out the root cause.
PS. But why 8 and not 4 ?
What if to play with CCR.STKALIGN=1 in your code?
Is it the issue with C++ classes only? What about C-style API?
Just checked the code on Cortex-A53, no issues with alignment.
Another idea is to add __attribute__((__packed__))
to the structures declarations
Thanks for quick reply.
I use Zephyr RTOS and it has already set 8 bytes stack alignment (because MPU is used for stack overflow protection). I think that stack alignment should have no influence on problem because I also used FdD for testing (with the same effect) where all data is allocated on heap.
I did not try C API yet.
I also tried packed structures and it works but it makes access to their fields quite expensive (comparing to unpacked/aligned structures).
IMHO if code is doing pointer arithemetic and then code uses such pointer as an addres of structure (i.e. _init.buf assignment) then it is required that such (calulated) address is properly aligned.
Unfortunately in 99.99% cases you will get proper results but i.e. compiler version change can trigger unaligned access. I also tried to run my test on QEMU for Cortex-M3 and there was no problem, so for me it means that problem is triggered by some very minor change in compiler options.
I also tried to run my test on QEMU for Cortex-M3 and there was no problem, so for me it means that problem is triggered by some very minor change in compiler options.
Can you capture the compiler options in both cases?
IMHO if code is doing pointer arithemetic and then code uses such pointer as an addres of structure
Almost all code in C / C++ does pointer arithmetic, and uses such pointers. The raw pointers are the basic of C. And doing manual alignment every time you use the pointer would be real headache for all developers. But that's not true. I agree, that sometimes to speed up a program execution used structures must be aligned in memory, but UsageFault is something specific to the platform, and we need to find the root cause for that to properly fix the issue. The alignment to 8 bytes will not be good solution for other platforms.
Of course, it is possible to make quick fix with #ifdef
to support your case.
PS. On ARM website, I have found 2 compiler options --unaligned_access, --no_unaligned_access
The documentation for STM32W55 points out that this CPU has Cortex-M4 core and according to ARM website Cortex-M4 has Armv7E-M architecture. The ARM documentation on the compiler options says that Armv7E-M supports unaligned access.
PPS. And I'm thinking if aligned_alloc
instead of new can help (: IFd(new uint8_t[size+7], size+7)
)
PPPS.
uintptr_t m_data[(S + sizeof(uintptr_t ) - 1) / sizeof(uintptr_t )]
You are right - STM32WB55 is obviously Cortex-M4. I am sorry for messing this - I was just after build for QEMU Cortex-M3 and I had compilation output on screen when writing issue description.
I attached compiler command line for both platforms.
Except difference in mcu option the interesting part is that _FORTIFY_SOURCE=2
is used only for Cortex-M4.
I also found something similiar problem described on stackoverflow: https://stackoverflow.com/questions/59592724/why-am-i-getting-an-unaligned-memory-access-fault-cortex-m4
I did 8 bytes alignment in example because I also tried it on x86_64 and there is 8 bytes alignement for some pointers.
So your idea to use sizeof(uintptr_t)
as alignment value seems to be very good.
At least on my platform aligned_alloc
is problematic (linker error: undefined reference to `posix_memalign').
If you wrap aligned allocation in some HAL/platform wrapper then it can be very good solution (i.e. in Zephyr I have k_aligned_alloc
).
I added some alignment implementation on the master branch. If you have a time to test it, it would be very helpful.
Hi, I've tested revision 46993c8, issue is still visible. What I noticed is that calling setWindowSize function with argument different than 4 will cause HF. Disassembly shows that HF is caused by STRD instruction. Pointer ptr value that is problematic came from calculation in tiny_fd_init() before call of hdlc_ll_init().
Hi,
Ok, what we have:
I also tried packed structures and it works but it makes access to their fields quite expensive (comparing to unpacked/aligned structures).
And here is the link to documntation: https://developer.arm.com/documentation/100748/0609/writing-optimized-code/packing-data-structures
If a member of a structure or union is packed and therefore does not have its natural alignment, then to access this member, you must use the structure or union that contains this member. You must not take the address of such a packed member to use as a pointer, because the pointer might be unaligned. Dereferencing such a pointer can be unsafe even when unaligned accesses are supported by the target, because certain instructions always require word-aligned addresses.
uint8_t m_data[S] attribute ((aligned (8))) {}; ... : IFd(new uint8_t[size+7], size+7) ... uint32_t ptrValue = (uint32_t)ptr; ptrValue = ((ptrValue + 7U) / 8U) 8U; _init.buf = (void)ptrValue;
What I noticed is that calling setWindowSize function with argument different than 4 will cause HF. Disassembly shows that HF is caused by STRD instruction
But it does almost nothing, just sets member field of the class. How it can fail?
void setWindowSize(uint8_t window)
{
m_window = window;
}
tiny_fd_data_t
is always aligned.Currently alignment is set to 4 bytes for Cortex-M4:
#if defined(__TARGET_CPU_CORTEX_M0) || defined(__TARGET_CPU_CORTEX_M0_) || defined(__ARM_ARCH_6M__) || \
defined(__TARGET_CPU_CORTEX_M3) || defined(__TARGET_CPU_CORTEX_M4) || defined(__ARM_ARCH_7EM__) || \
defined(__ARM_ARCH_7M__)
#define TINY_ALIGNED_PTR TINY_ALIGNED(sizeof(uintptr_t))
#else
#define TINY_ALIGNED_PTR TINY_ALIGNED(sizeof(uintptr_t))
#endif
But you can change that to 8 bytes by replacing sizeof(uintptr_t)
with 8
, or even you can try to use alignment 16.
It seems that last commit fixed issue, at least I'm not able to reproduce it. Let's wait for final confirmation from @r2r0
HI @Zzzwierzak
That's nice to hear that. I added alignment checks for the provided buffer to hdlc and fd initialization code. However, from the API standpoint maybe a better way is to remove so strict requirements for the user buffer, and automatically allocate a aligned space for the structures inside the buffer, as @r2r0 mentioned in initial post. Let me know what you think.
Hi @lexus2k
Thanks for your support - present version works well in our application.
In fact alignment requirements for user supplied bufer can be sometimes problematic.
Probably the definition of TINY_ALIGN_STRUCT_VALUE can be unified to sizeof(uintptr_t) - I used in my sample hardcoded value of 8 only for quick test on x86 platform but sizeof(uintptr_t) seems to be much more elegant solution.
I wonder if allocation in FdD::FdD should also use TINY_ALIGN_STRUCT_VALUE (istead of sizeof(uintptr_t)).
HI
I wonder if allocation in FdD::FdD should also use TINY_ALIGN_STRUCT_VALUE (istead of sizeof(uintptr_t)).
You're absolutely correct. I've just fixed that in last commit.
In fact alignment requirements for user supplied bufer can be sometimes problematic.
Ok, I will remove strict verififcations from HDLC and FD.
Probably the definition of TINY_ALIGN_STRUCT_VALUE can be unified to sizeof(uintptr_t) - I used in my sample hardcoded value of 8 only for quick test on x86 platform but sizeof(uintptr_t) seems to be much more elegant solution.
Can you confirm that sizeof(uintptr_t)
works for you without any issues? If yes, I will update the header file.
Yes, I confirm that sizeof(uintptr_t) works for me.
@r2r0 Thank you very much.
I'm closing the issue for now, as the problem is considered to be fixed. If you have any questions and notes, please let me know.
I got usage fault beacuse of unaligned memory on STM32WB55 (Cortex-M4) when accessing structure fields holding function pointers (callbacks) i.e. function hdlc_ll_reset is failing at line:
handle->rx.state = hdlc_ll_read_start;
.For confirmation I added (dirty hack) alignment to memory pointers used to initialize buf field in hdlc_ll_init_t and such change allows code to execute properly: TinyProtocolFd.h:301
TinyProtocolFd.h:317
tiny_fd.c:584:
Compiler GCC 10.2.0