raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.71k stars 919 forks source link

RP2350 FPU compiler flags #1993

Open DatanoiseTV opened 5 days ago

DatanoiseTV commented 5 days ago

See https://github.com/earlephilhower/arduino-pico/issues/2535

According to this it seems like it is using softmp mode for floating point operation for the RP2350.

SoftFP (soft floating point): Use hardware floating-point operations, but function calls pass parameters via general-purpose registers (compatible with non-FPU systems).

HardFP (hard floating point): Pass floating-point values in floating-point registers during function calls (which is faster but requires hardware FPU support).

Is there any reason for this and would hardfp make floating point performance better (I am using the FPU primarely for Audio DSP).

matsobdev commented 5 days ago

I was testing that and hard instead softfp was initially a bit faster, but my test code was using atan2f() and it happened to spit out some garbage when hard. Some regular hard multiplication and divisions turned out to be equally fast as softfp.

PS. But GCC 8.3.1 was fastest of all GCCs (9.3.1 was a bit slower and then 10, 11, 12, 13 like 10 % slower - but it was some "specific" code. Some "regular" one was not faster like that). LLVM LTO (whichever version) was the fastest - but code overall, not just FP I guess when LTO.

earlephilhower commented 5 days ago

...was using atan2f()...

That's not going to work with the standard SDK setup. Many FP functions are wrapped, such as atan2f and sin to use ROM routines. If you change the ABI of your code you can't call those ROM routines. If you disabled the --Wl,--wrap=xxxxx from the build process you might be able to get everything running. Run make VERBOSE=1 to see the final link stage which will have these --wrap options.

That said, if you're doing any reasonable amount of FP ops in a function, and not just one or two, I don't imagine any gains to be very large. Moving from FP regs to normal ones is only a 1-cycle OP, I'd imagine...

kilograham commented 5 days ago

...was using atan2f()...

That's not going to work with the standard SDK setup. Many FP functions are wrapped, such as atan2f and sin to use ROM routines. If you change the ABI of your code you can't call those ROM routines. If you disabled the --Wl,--wrap=xxxxx from the build process you might be able to get everything running. Run make VERBOSE=1 to see the final link stage which will have these --wrap options.

That said, if you're doing any reasonable amount of FP ops in a function, and not just one or two, I don't imagine any gains to be very large. Moving from FP regs to normal ones is only a 1-cycle OP, I'd imagine...

Roughly what i was going to say :smile:

  1. Sadly GCC does not have a separate setting for single-precision vs double-precision, and double-precision is always software (so softfp is better for that)
  2. As @earlephilhower says the overhead of softp for single-precision is only with non-inlined functions that take single-precision floating point args/returns... any single precision floating point ops within a function will remain in FP registers.

Upshot is that we decided that the benefit of hard-fp is minimal

More importantly, using it is currently not supported as noted, as all the assembly functions in pico_float and pico_double assume softfp (this is something we plan to fix in an upcoming SDK release)

You can probably use hardfp if you set pico_float_implementation and pico_double_implementation to compiler (though I haven't tried)