lvgl / lv_binding_micropython

LVGL binding for MicroPython
MIT License
251 stars 161 forks source link

Optimize program size #81

Open amirgon opened 4 years ago

amirgon commented 4 years ago

The Micropython bindings are compiled into binaries and consume relatively large program space (flash/rom). Need to find ways to decrease program size.

Related:

embeddedt commented 4 years ago

I should measure how much space strings are taking up - the style API names are quite long and repetitive (e.g. set_style_local is at the beginning of every property). We might gain some space there.

amirgon commented 4 years ago

@embeddedt could you describe how you measured the bindings program size on stm32? I used nm -S --size-sort -r -t d lv_mpy.o for the Unix port, did you use the same? (Indeed lv_mpy.o does not include strings, but it includes everything else)

embeddedt commented 4 years ago

could you describe how you measured the bindings program size on stm32?

I simply measured the size of the binary that was flashed to my board. Without LVGL, it's slightly under 500KB. LVGL adds another 500KB to that size, so the binary just barely fits into my 1MB of flash.

amirgon commented 4 years ago

I simply measured the size of the binary that was flashed to my board. Without LVGL, it's slightly under 500KB. LVGL adds another 500KB to that size, so the binary just barely fits into my 1MB of flash.

@embeddedt When you say LVGL do you include the Micropython bindings in it? Or do you count the bindings inside the core Micropython code (which is not so small by itself)?

It is not clear to me what is the binary size split between these components:

embeddedt commented 4 years ago

I'm counting both the MicroPython bindings and LVGL. I don't think LVGL itself isn't adding more than 100K-150K because I use it on the same board without MicroPython.

The MicroPython core itself takes up 500K. LVGL+bindings add another 500K. I haven't measured LVGL itself in the context of MicroPython.

amirgon commented 4 years ago

I'm counting both the MicroPython bindings and LVGL. I don't think LVGL itself isn't adding more than 100K-150K because I use it on the same board without MicroPython.

I don't think you can compare lvgl size when used with micropython to lvgl size when used with C code.
When lvgl is compiled with micropython, the optimizer cannot optimize out any features or widgets - everything is compiled in and available on runtime (unless disabled on lv_conf.h). On the other hand, when you compile lvgl with C code, the optimizer/linked removes all the code of functions and widgets you are not using in your C code. You could compare it only if you had a C app which uses all widgets and calls all possible functions of lvgl.

embeddedt commented 4 years ago

That's a good point, and probably explains the code size. I guess it's just a limitation of dynamic languages.

The advantage is that the code size isn't likely to grow substantially beyond the text of a script. I'd still like to strip it down some more though, because there is only ~80K of flash space left for scripts/images.

stale[bot] commented 4 years ago

This issue or pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

amirgon commented 4 years ago

I spent some more time looking at lv_micropython program size on ESP32.
Currently it consumes ~2MB flash:

12   .flash.rodata    PROGBITS   0x3f400020   555.7 KiB   WRITE,ALLOC        
13   .flash.text      PROGBITS   0x400d0018     1.5 MiB   ALLOC,EXECUTE      

I tried elf-size-analyze to take a deeper look at these sections (it didn't work out of the box, needed some small fixes).
Here is what I got:

Conclusions:

amirgon commented 3 years ago

@embeddedt could you try running elf-size-analyze on stm32? It is interesting to know where most of the program memory is spent there. My analysis above only applies to ESP32.

embeddedt commented 3 years ago

https://gist.github.com/embeddedt/a8f81caabaeff56f1b30110d085add0b

These numbers are heavily inflated because debug symbols are enabled. With debug enabled the image is megabytes larger than the release build (and would not fit on the real board). I suspect this is caused by the high number of functions. I would have liked to run this test without debug symbols, however, there is no path information in that case.

I am not too concerned about RAM usage; the debug build is only using 64K and the board has a total of 320KB of SRAM plus 8MB of external RAM, so my listing below will focus only on Flash usage.

I hope that the debug symbols are only affecting the relative magnitude of the numbers and not the size distribution.

amirgon commented 3 years ago

I would have liked to run this test without debug symbols, however, there is no path information in that case.

@embeddedt There is something I don't understand here. Debug symbols are usually on a different linker section than code. In ESP32 case, only the .text section is actually uploaded to the device so debug symbols on other sections don't really affect code size on the device. Is this different in STM32 case?
Are you sure you built with -Os option in both cases?

Anyway, how about building without debug symbols and looking at *.o file sizes? That would give you a general idea about code size distribution.

embeddedt commented 3 years ago

You're probably right; it could easily have built without optimization because I just used the standard DEBUG=1. I suppose I can modify the Makefile to emit debug symbols without changing anything else.

amirgon commented 3 years ago

As part of function pointers support (https://github.com/lvgl/lv_binding_micropython/issues/110), the binding script now identifies functions with the exact same prototype and generates only a single wrapper implementation to all functions with the same prototype. The wrapper function is receiving the function pointer of the specific API function that should be called.
I haven't measured it yet, but this should improve program size significantly.

Struct wrappers are also consuming lots of program size.
Many of these structs could be opaque to the user (only passing them by pointer to functions).
So another way to improve program size is to "hide" the struct fields from the user by declaring it as private or by only declaring the struct without its fields on the public API.

embeddedt commented 3 years ago

With the latest master branch, the STM32 build is now 892K and thus fits with decent room to spare in the internal flash. Nice work! :clap:

amirgon commented 3 years ago

the STM32 build is now 892K

That's great!

One thing that worries me about v8 - today most functions receive lv_obj_t pointer as argument, so there is a lot of reuse even between widgets because we can find functions with the exact same prototype on different widgets. However, if in v8 member functions receive the specific widgets pointer, we would have much less opportunities for wrapper reuse.

What do you think about "opaque structs"? Do you think it's reasonable to do that for v8? If we have enough structs that can be converted to opaque (or "private"), that could improve program size even more. That could work only if we don't add a lot of additional setters/getters, or if most of these would have the same prototype so they could be reused. Opaque structs can be achieved by passing them as void*, or explicitly marking them as private (underscore prefix for example)