Illegal instructions generated for Arm Cortex-R5 processor when compiling code using floating-point operations

rikyborg commented 1 month ago

When targeting the Arm Cortex-R5 processor, rustc and llvm generate assembly containing floating-point instructions. These are not available on the Cortex-R5 (only on the Cortex-R5F) and cause the processor to halt.

Details

For example, the following code:

fn add_s(a: f32, b: f32) -> f32 {
    a + b
}

fn add_d(a: f64, b: f64) -> f64 {
    a + b
}

compiled with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 generates the assembly (compiler explorer link):

add_s:
        vmov    s0, r1
        vmov    s2, r0
        vadd.f32        s0, s2, s0
        vmov    r0, s0
        bx      lr

add_d:
        vmov    d0, r2, r3
        vmov    d1, r0, r1
        vadd.f64        d0, d1, d0
        vmov    r0, r1, d0
        bx      lr

Note the vadd.f32 and vadd.f64 instructions, that are available on the Cortex-R5F which has an FPU, but that are illegal instructions on the Cortex-R5 without an FPU.

My expectation is that, using the target armv7r-none-eabi (rather than armv7r-none-eabihf) and the target CPU cortex-r5 (rather than cortex-r5f), rustc and llvm would generate legal instructions for the processor, i.e. use software floating-point features rather than hard-float instructions.

Relevant information

Original thread on URLO: Unexpected codegen for Cortex-R5 without FPU.

The upstream LLVM CPU model for cortex-r5 includes the flag FeatureVFP3_D16, which seems OK for the R5F but wrong for the R5. However, manually disabling that feature doesn't seem to help (see below). Link to llvm repo at tag 18.1.7.

Information from the ARM Cortex-R Series Programmer's Guide: 6.1.6. VFP in the Cortex-R processors.

Things that do work

Skipping the target-cpu flag

Compiling the snippet above with flags `--target armv7r-none-eabi -C opt-level=3` generates: ```asm add_s: push {r11, lr} bl __aeabi_fadd pop {r11, pc} add_d: push {r11, lr} bl __aeabi_dadd pop {r11, pc} ```

Adding the soft-float target feature

Compiling the snippet above with flags `--target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=+soft-float` generates: ```asm add_s: push {r11, lr} bl __aeabi_fadd pop {r11, pc} add_d: push {r11, lr} bl __aeabi_dadd pop {r11, pc} ``` However, rustc generates the warning: ``` warning: unknown and unstable feature specified for `-Ctarget-feature`: `soft-float` | = note: it is still passed through to the codegen backend, but use of this feature might be unsound and the behavior of this feature can change in the future = help: consider filing a feature request warning: 1 warning emitted ```

Things that don't work

Removing the vfp3d16 target feature

Compiling the snippet above with flags `--target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=-vfp3d16` generates: ```asm add_s: vmov s0, r1 vmov s2, r0 vadd.f32 s0, s2, s0 vmov r0, s0 bx lr add_d: vmov d0, r2, r3 vmov d1, r0, r1 vadd.f64 d0, d1, d0 vmov r0, r1, d0 bx lr ```

Meta

I've tested both stable and nightly toolchains, with same results.

rustc --version --verbose:

rustc 1.80.0 (051478957 2024-07-21)
binary: rustc
commit-hash: 051478957371ee0084a7c0913941d2a8c4757bb9
commit-date: 2024-07-21
host: x86_64-unknown-linux-gnu
release: 1.80.0
LLVM version: 18.1.7

tgross35 commented 1 month ago

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there? Here is an llc reproduction that shows cortex-r5 is the cause https://llvm.godbolt.org/z/zzK7KGv95

tgross35 commented 1 month ago

Whoops, didn't mean to remove prioritize but I think the labels raced.

workingjubilee commented 1 month ago

cc @chrisnc as the maintainer for this target.

We updated the float support in https://github.com/rust-lang/rust/pull/123159 to be more-correct for this target but it looks like we didn't quite stick the landing I guess?

Processors in this family include the Arm Cortex-R4, 5, 7, and 8.

We most definitely intentionally included the R5 here, though apparently we assume an R5F.

chrisnc commented 1 month ago

LLVM's policy for Arm target-cpu is that it will enable the maximal set of features that the chosen CPU might support, and expects users to disable ones that they don't have/want, so it is correct and expected that the compiler assumes vfp3d16 when you specify cortex-r5, even when using the non-hf target. The term "R5F" is just a shorthand for "a Cortex-R5 with FPU support", and for LLVM this is the intended default. Arm doesn't define it as a separate CPU, nor does LLVM. (Edit: looks like R4(F) is an exception to this in LLVM.) Note also that the examples still take the f32 arguments in the integer registers, which doesn't happen on the hf target, so overall the code generated is correct. The only thing I don't understand is why target-feature=-vfp3d16 did not work as expected, so I will investigate that ~, and see if this is new as of #123159 or was always this way~. Edit: no, this can't be affected by that change, because we only modified the hf targets to use slightly different default features so they would compose better with target-cpu, but this case is about the non-hf targets. Still checking on the behavior there...

chrisnc commented 1 month ago

--target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=-vfp2sp,-fp64

https://godbolt.org/z/cj38s8Ev6

This will also disable the floating-point support. I think the issue is that the implied +vfp3d16 from cortex-r5 causes multiple other features to be enabled, but -vfp3d16 doesn't disable those dependencies. Just using +soft-float in your project is still probably the right approach here, rather than referring to the VFP feature tree internal to LLVM.

rikyborg commented 1 month ago

Thanks a lot @chrisnc for investigating this issue!

Just using +soft-float in your project is still probably the right approach here, rather than referring to the VFP feature tree internal to LLVM.

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for `-Ctarget-feature`: `soft-float`. Is there any way to suppress that?

Or should the non-hf target already include the +soft-float feature? If the target does not have hard-float support, is there any reason to generate code using floating-point instructions? (I might be missing something here...)

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there?

Should I still go ahead and file an issue with LLVM as suggested initially by @tgross35?

One source of confusion for me is that LLVM seems to distinguish between Cortex-R4 and Cortex-R4F, but not between Cortex-R5 and Cortex-R5F.

apiraino commented 1 month ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

chrisnc commented 1 month ago

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

I don't think there's a way to disable this right now except to use nightly. I can't find an open issue for stabilizing these right now, but that is the path. This warning is relatively recent, and this issue gives some explanation of why it's there: https://github.com/rust-lang/rust/pull/117616.

Or should the non-hf target already include the +soft-float feature? If the target does not have hard-float support, is there any reason to generate code using floating-point instructions? (I might be missing something here...)

The hf-ness is about the ABI, not what the target could support if the user asks for it (which was the case here). It's perfectly valid to use the soft-float ABI on a core that has an FPU, and the compiler can use the VFP instructions for things other than computation on f32/f64 (or you might legitimately need to do floating-point computation while calling/being called by code that uses the soft-float ABI because you can't re-compile it). This is consistent with how clang/llvm handle this; the expectation is that you use a combination of target-cpu+target-feature to specify what you want, rather than having the ABI choice forcibly disable some features.

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there? Should I still go ahead and file an issue with LLVM as suggested initially by @tgross35?

One source of confusion for me is that LLVM seems to distinguish between Cortex-R4 and Cortex-R4F, but not between Cortex-R5 and Cortex-R5F.

I think that would be worthwhile, at least to understand why there is a separate cortex-r4f but not for the others. I will speculate that this is for historical reasons and that they would remove it if not for backward compatibility, because it is inconsistent. Anecdotally, I think nofp cortex-r4 are much more common than nofp cortex-r5, which might also explain the exception, but I don't know the provenance of the one you are using. When you are using clang, you would use -mcpu=cortex-r5+nofp for your case, which obviates the need for a separate CPU name or explicitly disabling exactly the FPU features that are included with cortex-r5.

workingjubilee commented 1 month ago

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

We should probably just add this feature.

workingjubilee commented 1 month ago

@chrisnc For note, LLVM seems to have flipflopped on the maximal vs. minimal thing for Arm CPUs at least twice now, and recent Arm CPUs (aarch64, mostly) are more likely to use a minimal featureset, though that may only be true from Armv9 onwards.

chrisnc commented 1 month ago

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

We should probably just add this feature.

Whoops, I missed that this was also unknown, not just unstable, when I wrote my comment. I'm not sure this will be possible though, because it affects the ABI, and the reasoning here applies to Arm also.

Should rustc consider something like clang's -mcpu which allows things like cortex-r5+nofp? I'm not sure if this is exposed in a re-usable way by llvm, but it could go a long way in solving these types of problems.

For note, LLVM seems to have flipflopped on the maximal vs. minimal thing for Arm CPUs at least twice now, and recent Arm CPUs (aarch64, mostly) are more likely to use a minimal featureset, though that may only be true from Armv9 onwards.

Interesting... At least for the v7m, v7r, v8m, and v8r cores, the default feature sets have all been maximal when each core was added and then not changed except for refactoring and bug fixes from what I can see, but A-profile and Aarch64 are quite a bit more varied and may have different considerations. It seems that R4 as distinct from R4F was done originally in GCC in 2008 and then LLVM followed it, but no other exceptions exist, even though "M4F" and "R5F" and others are commonly mentioned when a non-trivial population of them are nofp.

rust-lang / rust