gcc-xtensa appears to hardcode data alignment to 4

pfalcon commented 9 years ago

I'm investigating why project built for xtensa produces unexpectedly high data section size. Project is built with -fdata-sections (which is common for embedded projects; the issue likely manifests itself even without it still). In map file I see:

 .rodata.rule_yield_arg
                0x000000003ffe9cfc        0x6 build/py/parse.o
 *fill*         0x000000003ffe9d02        0x2
 .rodata.rule_yield_expr
                0x000000003ffe9d04        0x6 build/py/parse.o
 *fill*         0x000000003ffe9d0a        0x2

I.e. each 6-byte structure gets aligned on 4-byte boundary. Looking at structures, they consist only of short's, i.e. have natural alignment of 2.

The simplest testcase to reproduce the issue is:

struct foo {
    short a, b, c;
};

struct foo s1 = {1};
struct foo s2 = {2};

When build for both arm and x64, this produces following assembly:

    .global s1
    .data
    .align  2
    .type   s1, %object
    .size   s1, 6
s1:
    .short  1
    .space  4
    .global s2
    .align  2
    .type   s2, %object
    .size   s2, 6
s2:
    .short  2
    .space  4

with xtensa-lx106-elf-gcc the result is:

    .global s1
    .data
    .align  4
    .type   s1, @object
    .size   s1, 6
s1:
    .short  1
    .zero   4
    .global s2
    .align  4
    .type   s2, @object
    .size   s2, 6
s2:
    .short  2
    .zero   4

Note the difference in ".align" directives.

The expect behavior is that structure alignment should be its natural alignment (which is defined as maximum alignment of any structure field). Is current xtensa-lx106-elf-gcc behavior grounded in any Xtensa ABI or something? Even if it is, the behavior like above is detrimental for embedded usage, where ABI issues are not relevant, but losses from overzealous alignment are noticeable (for example, in the original case, there're hundreds of such structures; if structures/variables are just single short's, there's 50% loss of space in 4-byte alignment).

jcmvbkbc commented 9 years ago

That matches exactly what MIPS does. I've had a look at what others do here, found no consensus. Most common is doing natural alignment when optimizing for size. I can implement that, will that work for you?

pfalcon commented 9 years ago

Yes, sure, if you think it makes sense, that would be good enough and definitely would help that project, as it's built with -Os. Thanks!

jcmvbkbc commented 9 years ago

Pushed proposed fix to the call0-4.8.2-natural-align branch for preview. Will test and integrate soon. Thanks for your report.

jcmvbkbc commented 9 years ago

Fixed and submitted upstream.

pfalcon commented 9 years ago

Thanks. Have been travelling and didn't have chance to look into it since, but hope to get to it coming weeks.

pfalcon commented 9 years ago

Turns out, I never tested this properly nor upgraded esp-open-sdk to the version with this patch. I did testing now, using MicroPython esp8266 build as a subject.

Here're section address/size diffs between build with old and new toolchain:

 .irom0.text     0x0000000040210000    0x3fe2c
 .text           0x0000000040100000     0x7236
 .data           0x000000003ffe8000      0x574
-.rodata         0x000000003ffe8580     0x511c
-.bss            0x000000003ffed6a0     0xd3a0
+.rodata         0x000000003ffe8580     0x4c80
+.bss            0x000000003ffed200     0xd3a0

So, only .rodata is affected, and more than a kilobyte was saved. For micropython, that can be well saving 10% of available RAM, i.e. really good results. Thanks! (esp-open-sdk is also upgraded)

@dpgeorge: FYI

dpgeorge commented 9 years ago

@pfalcon thanks for ping, it's great that such improvements can be made upstream for all to benefit from.

jcmvbkbc / gcc-xtensa

gcc-xtensa appears to hardcode data alignment to 4 #2