mattairtech / ArduinoCore-samd

This is a fork from arduino/ArduinoCore-samd on GitHub. This will be used to maintain Arduino support for SAM D|L|C (M0+ and M4F) boards including the MattairTech Xeno Mini and the MT-D21E (see https://www.mattairtech.com/). It adds support for new devices like the D51, L21, C21, and D11. It also adds new clock sources, like a high speed crystal or internal oscillator.
103 stars 43 forks source link

Binaries very large compared to AVR #17

Open timonsku opened 6 years ago

timonsku commented 6 years ago

I'm not sure how specific it is to the SAMD11 implementation so apologies if this is the wrong place to report/ask about this but I currently face the issue that a lot of code compiles to a much smaller size for AVR platforms and to sometimes 2-5 times that sizes for SAMD11. F.e. I tried to compile this example while having a SAMD11 selected it compiles to >25kb (USB disabled) but with an ATmega328P selected it compiles to only 7kb. This sounds like there might be some optimizations missing in the compiler flags or does the architecture really have such a huge overhead?

Compilation for the above mentioned example terminates with: h_SDI-12_slave_implementation.ino.elf section.text' will not fit in region FLASH' Maybe some symbols being linked that aren't supposed to be linked?

I'm using the beta board .json for the board manager.

E: Compiling the BareMinimum example results in a ~4kb binary, again with USB disabled. With AVR it is around 444bytes.

mattairtech commented 6 years ago

It looks like the code you referenced above is pulling in a bunch of floating point code. I checked with the upstream SAMD core and same issue. I will need to investigate more, but I suspect it won't be as simple as compiler flags. I should have a little more time on Tuesday to look into it. As far as the Bare Minimum sketch, the sketch size can be down to 2.5KB (without USB or serial). It would be possible to get this lower, but not anywhere near the size of the AVRs (there is much more setup code required than with the AVRs, and the interrupt vector table for the 32-bit D11 is 140bytes alone). I will look into getting the central core smaller for next beta release (possibly more menu option(s)), however, the issue you are having is not related.

timonsku commented 6 years ago

Awesome, thanks for looking into it! :) The base 4kb would be ok though if other code wouldn't explode so much. I guess the Arduino folks didn't care for much optimization given the huge 256kb flash available on the official SAMD21 boards.

mattairtech commented 6 years ago

I have been really busy lately and have not had a chance to look into this. I will definitely have time on Friday or Saturday.

timonsku commented 6 years ago

No stress, I'm glad you're having a look at all :)

mattairtech commented 6 years ago

I will still need more time to work on this. It looks like double precision floating point code is being pulled in, which is much larger than single precision. This is maybe related to doubles and floats being treated the same in AVR8. I would like both 4-byte float and 8-byte double support (with floats supported by the FPU on the D51 (and double support still fully in software)). I've also been thinking of several other code space saving improvements that I will implement as well. I will have much more time this week to work on this, since I had already scheduled some time to get another beta released hopefully by the end of the week. I'll post updates here.

timonsku commented 6 years ago

This also seems fairly inflated at an overflow of ~17kb ( with USB_CDC, no uart) https://github.com/adafruit/Adafruit_BME280_Library/tree/master/examples/bme280test Just removing the bmp.readAltitude(1013.25) function call reduced the binary size by ~6kb. You might be on to something with the FP math. Not doing any of the three sensor reads in the example makes it compilable. Shaving of ~19kb of the binary.

mattairtech commented 6 years ago

OK, so I was busy with taxes and orders, but I have had more time to look into this today and have a much better understanding of the problem and how to fix it. I should have a solution over the weekend that greatly reduces code space usage when using the print() method with floats. Print() with floats uses String.cpp which uses dtostrf(), which in the samd core, simply uses sprintf() (the larger floating point version), which in turn pulls in malloc(), etc. The AVR core used the dtostrf() from AVR-libc, which uses a simpler and smaller method of conversion than sprintf(), but is partially written in AVR assembly. So, I will re-implement dtostrf() the AVR way, but with C instead of assembly. I will also need support for both single and double precision in some way.

mattairtech commented 6 years ago

I have made much progress on this, but I still have some testing. The h_SDI-12_slave_implementation.ino sketch now compiles to under 9KB, and the bare minimum sketch can be reduced to under 1.4KB with new compile options (can be enabled in menu) and a new (optional) more compact PinDescription table format. I will have more details soon. A quick correction: the print() methods have nothing to do with the string() methods. They use completely different methods to convert a floating point number into ASCII (details later).

timonsku commented 6 years ago

That is great news! :) If there is anything public to test yet let me know, I will give it a shot.

mattairtech commented 6 years ago

I was in the mountains for several days, but I am back now and should release a beta within a few days.

mattairtech commented 6 years ago

Finally, the beta has been released. I have not been able to test with the D11 (I tested the other chips), so please let me know how this works. You will want to disable usb (Tools->USB Config->USB_DISABLED), disable serial (Tools->Serial Config->NO_UART_ONE_WIRE_ONE_SPI), enable config.h (Tools->Build Options->config.h enabled), use PIN_DESCRIPTION_TABLE_SIMPLE (see config.h), and use single precision (Tools->Floating Point->Print & String use separate singles and doubles).

Thanks to the authors of ftoa_engine.c (used for single precision floats by the String class), written in assembly by Dmitry Xmelkov and rewritten in C by Soren Kuula (from the Ardupilot project). I was about to rewrite this myself in C, after figuring out the assembly code, when I searched and found this. Converting IEEE single precision numbers to printable base10 is more involved than one would think (conversion on paper is easy), so I am glad I didn't have to do it.

timonsku commented 5 years ago

That is a huge improvement! Thanks so much for your work.