micropython / micropython

MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems
https://micropython.org
Other
19.22k stars 7.7k forks source link

RP2040: floating point performance: switch to ROM libraries? #6813

Open dhalbert opened 3 years ago

dhalbert commented 3 years ago

Hi - @kevinjwalters benchmarked MicroPython and CircuitPython on a Pico, and noticed CP was twice as fast doing floating point: https://forums.adafruit.com/viewtopic.php?f=60&t=170425#p850554 and following.

From looking at the makefiles, appears this is because we (CircuitPython) are using the custom float libraries that RPi commissioned. I thought there might be a license issue for you, but the src/rp2_common/pico_float/ files are all BSD. I think the libraries themselves are all in ROM.

Tagging @tannewt for interest.

stinos commented 3 years ago

Both are using the same floating point type (float vs double), right?

dhalbert commented 3 years ago

Yes, rp2 is:

#define MICROPY_FLOAT_IMPL                      (MICROPY_FLOAT_IMPL_FLOAT)

However, maybe it's using MICROPY_OBJ_REPR_A, which would be boxed floats. A is the default and I don't see that rp2 changes that. CircuitPython uses C, which is 30-bit immediate unboxed floats.

lurch commented 3 years ago

I think the libraries themselves are all in ROM.

Yup, here for anybody curious :slightly_smiling_face:

rgov commented 1 year ago

I attempted to add pico_double and pico_float to this list. The build succeeded, and it did link in the right files, but it made no difference to the performance. What else needs to be done to utilize the ROM libraries?

https://github.com/micropython/micropython/blob/e00a144008f368df878c12606fdbf651af2a1dc0/ports/rp2/CMakeLists.txt#L168

With benchmark results from the thread above:

Version 001AdditionA Results
CircuitPython 2.0156s,2.0156s,2.0156s,2.0156s,2.0156s (9922Hz)
MicroPython 4.2262s,4.2260s,4.2261s,4.2261s,4.2261s (4732Hz)
MicroPython with pico_float 4.0375s,4.0373s,4.0376s,4.0375s,4.0374s (4954Hz)
jimmo commented 1 year ago

@rgov MicroPython already uses the ROM libraries (and always has since the rp2 port was added). The pico-sdk does this for us. CircuitPython needs to explicitly enable it because it's not using the pico-sdk's build system (CMake).

The reason for the performance difference is what @dhalbert said here -- CircuitPython uses representation "C" by default, whereas MicroPython uses "A". (Personally, I think this is a good decision for CircuitPython, even though it comes at a loss of precision and slightly breaks IEEE754, the benefits in terms of being able to just use floats without such a big performance hit or requiring heap usage is a big win).

There was a discussion about this here: https://github.com/orgs/micropython/discussions/11730#discussioncomment-6122950 (including benchmarks that show that this is definitely what the difference is).