Closed mniestroj closed 3 years ago
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Thanks @mniestroj
An out of tree closed application code was used on nRF52840 with CONFIG_FPU=y and CONFIG_FPU_SHARING=n. Issue reproduced with some toolchains only, look at "Environment" below.
could you share some code example that can reproduce this issue?
Problem can be workarounded by setting CONFIG_FPU_SHARING=y.
Well, this comes at some cost, however, so I'd better not apply this if there's a cleaner solution.
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_FPU=y
to tests/kernel/poll/prj.conf
and executed scripts/twister -N --device-testing --device-serial /dev/ttyACM1 -p nrf52840dk_nrf52840 -T tests/kernel/poll/
. It fails that way, but works when CONFIG_FPU_SHARING=y
is added as well. Looking at zephyr.lst
the pattern is the same as I have noticed originally (in this Issue description) - s16
register is used to store (cache) values in a function, but at the same time it gets overwritten by another thread doing the same, thus overwriting s16
before using it.
Problem can be workarounded by setting CONFIG_FPU_SHARING=y.
Well, this comes at some cost, however, so I'd better not apply this if there's a cleaner solution.
When I have done initial investigation it turned out that compiler (here GCC) should prevent using FPU registers unless there are FPU operations involved (e.g. float multiplication). GCC allows to prevent all FPU registers from being used with -mgeneral-regs-only
commandline option, but then no operations on float
will work. So my conclusion was that GCC does not support CONFIG_FPU_SHARING=n
, at least based on the output for ARM Cortex-M4F. So the only solution in my opinion is to enable CONFIG_FPU_SHARING and maybe allow user to disable it if he is sure that application won't be affected. However I am not an expert in compiler area, so maybe I am missing something.
I've read the same thing here: https://gcc.gnu.org/pipermail/gcc-help/2020-July/139112.html, which aligns with your investigation.
The problem is that we have made assumptions based on the K_FP_REGS thread-create flag, that is, the thread won't use the FP registers so the stack frame will never need to contain the FP registers, but if this is not the case, we might have to do more things than just enable FPU_SHARING together with FPU.
I agree, all threads need to be treated like K_FP_REGS would be set for them. As a short term solution that flag could be set automatically in arch_new_thread()
for affected <compiler, arch> pair (combination that produces code with FP registers for temporary storage).
I agree, all threads need to be treated like K_FP_REGS would be set for them. As a short term solution that flag could be set automatically in
arch_new_thread()
for affected <compiler, arch> pair (combination that produces code with FP registers for temporary storage).
So in your understanding, this is just a subset of GCC versions that do this, or all GCC compilers (beyond some release) behave like this consistently?
So in your understanding, this is just a subset of GCC versions that do this, or all GCC compilers (beyond some release) behave like this consistently?
I would suggest to treat all GCC versions the same and assume that it might produce code that utilizes FP registers. This is because I haven't found any option for configuring such behavior in GCC codebase and I haven't found any statement in GCC's documentation about one behavior or the other. In summary it is kind of undefined whether FP (or any other than general purpose) registers are used. Issue described here was not reproducible with -O0
, but bisecting optimization flags didn't result in one responsible for using FP registers.
I was not able to reproduce this Issue with GCC 8, but it was reproducible with GCC 9 and 10. My suggestion with <compiler, arch> pair was to treat other compilers (like clang) differently from GCC if needed.
@mniestroj thanks for your input here
I agree, all threads need to be treated like K_FP_REGS would be set for them. As a short term solution that flag could be set automatically in arch_new_thread() for affected <compiler, arch> pair (combination that produces code with FP registers for temporary storage).
this comes at a high cost, though. Wondering:
Does GCC behave as described only under a certain -O configuration? If so then at least we can limit the problematic cases to a subset of optimization configurations.
this comes at a high cost, though.
Unfortunately. As I see for ARM K_FP_FLAGS "only" affects stack size/usage, but situation is worse for other archs. I am afraid we cannot do much without explicit compiler support for "do not use FP registers for non-FPU math operations".
Does GCC behave as described only under a certain -O configuration? If so then at least we can limit the problematic cases to a subset of optimization configurations.
-O0
did not use FP
-O1
(probably) used FP - I don't remember for sure now
-O2
and -Os
- used FP
-O0
did not use FP
Warning: it did not use FP register, but doesn't it means it will never do? We need guarantee it will not use them whatever the code compiled.
Warning: it did not use FP register, but doesn't it means it will never do? We need guarantee it will not use them whatever the code compiled.
I agree, we cannot trust that, it might change in new GCC versions or even when compiling slightly different code. Besides, optimizing FP usage in -O0
doesn't make sense...
So, I am still thinking around this issue. I am involving @tejlmand (Toolchain responsible and build-system maintainer) to get his view.
What I can tell about the workaround is:
This is not enough though: we need to treat all threads as having the K_FP_REGS flag set.
There is already the following code in function prepare_multithreading() in kernel/init.c:
#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
/* Enable FPU in main thread */
opt |= K_FP_REGS;
#endif
but I am not zephyr thread expert so I don't know whether it is enough ?
Otherwise I am inline with your statement. About extra memory, I can already tell that we need to enlarge CONFIG_MAIN_STACK_SIZE (see #31472), otherwise some tests will fail. Note: that CONFIG_MAIN_STACK_SIZE enlargement is also required for stm32f3_disco for some other tests (tests/kernel/threads/tls/, tests/kernel/fatal/exception/, tests/kernel/common/) independently of FPU. But I was waiting for the conclusion/fix on this issue to kill two birds with one stone :smiley: And thus proposal would be to update from 512 to 768 in kernel/Kconfig
config MAIN_STACK_SIZE
...
default 768 if ZTEST && !(RISCV || X86)
Thanks @ABOSTM , I am not worried on how to implement the workaround, if we decide, finally, to do it.
Another option would be to consider the solution presented in https://gcc.gnu.org/pipermail/gcc-help/2020-July/139112.html, that is, to apply the gen-registers-only GCC directive on code that is not supposed to use the FP Registers.
This falls on the user to do it, throughout the code base, though, so I am reluctant...
It seems the only reasonable solution at this time is to disallow CONFIG_FPU_SHARING=n
for ARM targets when building with GCC, until the GCC guys decide to add -mno-implicit-float
or something equivalent.
It seems the only reasonable solution at this time is to disallow
CONFIG_FPU_SHARING=n
for ARM targets when building with GCC, until the GCC guys decide to add-mno-implicit-float
or something equivalent.
@stephanosio thanks so much for your input. I started re-writing the issue description according to the discussion we are having here.
Just to clarify, do you think this affects AARC32 other than the -M profile?
It seems the only reasonable solution at this time is to disallow
CONFIG_FPU_SHARING=n
for ARM targets when building with GCC, until the GCC guys decide to add-mno-implicit-float
or something equivalent.@stephanosio thanks so much for your input. I started re-writing the issue description according to the discussion we are having here.
Just to clarify, do you think this affects AARC32 other than the -M profile?
@ioannisg As far as I can see, this applies to all ARM architecture variants. It is, however, not an immediate problem for the non-M profile variants at the moment because the relevant Zephyr arch ports do not support hardware FPU (and therefore do not allow enabling FP instructions).
@stephanosio thanks. Have you been able to dig deep into this? I wonder if GCC allows only callee-saved FP registers to be accessed by non-floating point calculations (i.e. s16 and above), or is it caller-saved too?
@katsuster could you check if this issue is relevant for Risc-V? For Cortex-M we have considered this to be a serious issue.
@katsuster could you check if this issue is relevant for Risc-V? For Cortex-M we have considered this to be a serious issue.
@dcpleung @abrodkin FYI - might worth checking if this is affecting x86 and ARC, respectively.
@ioannisg It seems that RISC-V GCC does not use FPU implicitly. Current GCC does not support SIMD of RISC-V instructions (packed-simd extension is not stable now).
Of course, I'm not sure about future events:
@ioannisg x86 does not seem to be affected by this. FPU/SSE have to be explicitly enabled before those registers can be used on 32-bit. On 64-bit, FPU/SSE are always available so we are saving those registers anyway.
Describe the bug Unshared FP Services mode is supposed to work properly when a single thread uses the FPU. However, it looks like GCC (version 9 and above) generates code that temporarily stores integer data to FPU registers, even if no FPU math operations are carried out within a function. The values in FPU registers can be overwritten by other threads, then such modified values are loaded into original function and cause program corruption. This happens because under unshared FP registers mode, the Cortex-M (and perhaps other architectures) do not save/restore the FP context during thread context-switch or even during exception entry and return.
A straightforward workaround to this problem is to force Shared FP Registers mode for Cortex-M whenever CONFIG_FPU is enabled. This could be done with the following work-around:
CONFIG_FPU_SHARING=y
. This will be enabled FP context preservation.To Reproduce An out of tree closed application code was used on nRF52840 with
CONFIG_FPU=y
andCONFIG_FPU_SHARING=n
. Issue reproduced with some toolchains only, look at "Environment" below.Additionally, the issue is reproduced by in-tree code, see #31472
Expected behavior Using Unshared FP Services mode should be either safe to use (assuming only one thread operates on floats) or disallowed (not being possible to select by Kconfig) if not supported. (As described above, this is not sufficient: we need to ensure FP Sharing mode is also working as expected.)
Impact Using Unshared FP Services mode with ARM Cortex-M4F (and possibly others) results in undefined behavior.
Code listing Here is the function that stores temporarily into FPU register (
vmov s16, r3
) and then loads from it later (vmov r1, s16
).Environment
Additional context Similar issue was reported here: https://answers.launchpad.net/gcc-arm-embedded/+question/691604