llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.66k stars 11.85k forks source link

objcopy zero-size section, huge binaries #45644

Open 6a5a5375-364f-40e7-a2e2-55be3cb4d36f opened 4 years ago

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago
Bugzilla Link 46299
Version 10.0
OS other
Attachments ELF
CC @alexshap,@dwblaikie,@MaskRay,@jh7370,@rupprecht

Extended Description

I'm expecting objcopy to create a binary from the attached ELF like this: llvm-objcopy A.elf -O binary A.bin

Running size tells me that the binary should have 824B, yet the file I get is 384MB big.

Using readelf -e A.elf to inspect the section headers I can see that there is a suspicious NULL section at the very beginning which is absolutely empty. Could this be the reason why the binary gets so bloated?

Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .vector_table PROGBITS 08000000 001000 000010 00 A 0 0 4 [ 2] .version PROGBITS 08000010 001010 000010 00 A 0 0 1 [ 3] .text PROGBITS 08000020 001020 000308 00 AX 0 0 4 [ 4] .rodata PROGBITS 08000328 001328 000000 00 AX 0 0 1 [ 5] .ARM.exidx ARM_EXIDX 08000328 001328 000010 00 AL 3 0 4 [ 6] .preinit_array PROGBITS 08000338 001338 000000 00 A 0 0 1 [ 7] .init_array INIT_ARRAY 08000338 001338 000004 04 WA 0 0 4 [ 8] .fini_array FINI_ARRAY 0800033c 00133c 000004 04 WA 0 0 4 [ 9] .data PROGBITS 20000000 002000 000000 00 WA 0 0 1 [10] .data2 PROGBITS 10000000 002000 000000 00 WA 0 0 1 [11] .bss NOBITS 20000000 002000 0001ac 00 WA 0 0 512 [12] ._user_heap_stack PROGBITS 200001ac 002000 000e04 00 WA 0 0 1 [13] .ARM.attributes ARM_ATTRIBUTES 00000000 002e04 000049 00 0 0 1 [14] .debug_str PROGBITS 00000000 002e4d 004795 01 MS 0 0 1 [15] .debug_loc PROGBITS 00000000 0075e2 001709 00 0 0 1 [16] .debug_abbrev PROGBITS 00000000 008ceb 000d43 00 0 0 1 [17] .debug_info PROGBITS 00000000 009a2e 00a9f9 00 0 0 1 [18] .debug_ranges PROGBITS 00000000 014427 000148 00 0 0 1 [19] .comment PROGBITS 00000000 01456f 00002a 01 MS 0 0 1 [20] .debug_frame PROGBITS 00000000 01459c 00065c 00 0 0 4 [21] .debug_line PROGBITS 00000000 014bf8 0012bb 00 0 0 1 [22] .symtab SYMTAB 00000000 015eb4 000690 10 24 60 4 [23] .shstrtab STRTAB 00000000 016544 000106 00 0 0 1 [24] .strtab STRTAB 00000000 01664a 0004b5 00 0 0 1

rupprecht commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#47563

llvmbot commented 4 years ago

The fix is backported to 10.0.1: llvm/llvm-project#45570

llvmbot commented 4 years ago

So, running the following on MSxxx produced by the latest LLD

llvm-objcopy MSxxx -O binary A.bin (llvm-objcopy is also built from latest sources)

results in a 768 bytes output for me. Seems there is no issue, you just have to update your LLD/LLVM.

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago

No I'm using 10.0.0

[vinci@threadripper ~]$ ld.lld -v LLD 10.0.0 (compatible with GNU linkers)

llvmbot commented 4 years ago

Are you using the lastest LLD available? Mine is

umb@ubuntu:~/tests/200$ ~/LLVM/LLVM/llvm-project/build/bin/ld.lld -v LLD 11.0.0 (https://github.com/llvm/llvm-project.git 16b7eb6dd1247dbe322061d33636a054d6c954dc) (compatible with GNU linkers)

llvmbot commented 4 years ago

By removing sections I meant removing them from the linker script (and from my startup code).

I think I succeeded in creating a linker reproduce file. Sadly the file size limit here does not allow me to attach it directly, so I uploaded it here: https://higaski.at/repro.tar

I've tried the repro provided and the ._user_heap_stack section is SHT_NOBITS for me:

umb@ubuntu:~/tests/200$ readelf -a MSxxx ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: ARM Version: 0x1 Entry point address: 0x8000020 Start of program headers: 52 (bytes into file) Start of section headers: 146144 (bytes into file) Flags: 0x5000400, Version5 EABI, hard-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 7 Size of section headers: 40 (bytes) Number of section headers: 24 Section header string table index: 22

Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .vector_table PROGBITS 08000000 010000 000010 00 A 0 0 4 [ 2] .version PROGBITS 08000010 010010 000010 00 A 0 0 1 [ 3] .text PROGBITS 08000020 010020 0002c8 00 AX 0 0 4 [ 4] .rodata PROGBITS 080002e8 0102e8 000000 00 AX 0 0 1 [ 5] .ARM.exidx ARM_EXIDX 080002e8 0102e8 000010 00 AL 3 0 4 [ 6] .preinit_array PROGBITS 080002f8 0102f8 000000 00 A 0 0 1 [ 7] .init_array INIT_ARRAY 080002f8 0102f8 000004 04 WA 0 0 4 [ 8] .fini_array FINI_ARRAY 080002fc 0102fc 000004 04 WA 0 0 4 [ 9] .data PROGBITS 20000000 010300 000000 00 WA 0 0 1 [10] .bss NOBITS 20000000 010300 0001ac 00 WA 0 0 512 [11] ._user_heap_stack NOBITS 200001ac 010300 000e04 00 WA 0 0 1 [12] .ARM.attributes ARM_ATTRIBUTES 00000000 010300 000049 00 0 0 1 [13] .debug_str PROGBITS 00000000 010349 004795 01 MS 0 0 1 [14] .debug_loc PROGBITS 00000000 014ade 001481 00 0 0 1 [15] .debug_abbrev PROGBITS 00000000 015f5f 000d4a 00 0 0 1 [16] .debug_info PROGBITS 00000000 016ca9 00a78d 00 0 0 1 [17] .debug_ranges PROGBITS 00000000 021436 000148 00 0 0 1 [18] .comment PROGBITS 00000000 02157e 00007e 01 MS 0 0 1 [19] .debug_frame PROGBITS 00000000 0215fc 00065c 00 0 0 4 [20] .debug_line PROGBITS 00000000 021c58 00128b 00 0 0 1 [21] .symtab SYMTAB 00000000 022ee4 000660 10 23 60 4 [22] .shstrtab STRTAB 00000000 023544 0000ff 00 0 0 1 [23] .strtab STRTAB 00000000 023643 00049c 00 0 0 1 Key to Flags:

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago

By removing sections I meant removing them from the linker script (and from my startup code).

I think I succeeded in creating a linker reproduce file. Sadly the file size limit here does not allow me to attach it directly, so I uploaded it here: https://higaski.at/repro.tar

llvmbot commented 4 years ago

At first I thought that issue might be because of no input sections in ._user_heap_stack definition, but we have a test case that handles such case, e.g.: https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/linkerscript/noload.s

And our handling in LLD looks trivial for such a simple case: https://github.com/llvm/llvm-project/blob/master/lld/ELF/ScriptParser.cpp#L765

So to answer the question why ._user_heap_stack is created as a PROGBITS would be helpfull either to have a little sample, or a linker reproduce file (if it is acceptable). Reproduce file can be created with a --reproduce option. It creates a tar with all linker inputs included and can be used to debug the behavior.

jh7370 commented 4 years ago

The size tool will only give an indication of the memory footprint of the sections within a binary. It does not indicate the size of the program segments, which could theoretically be beyond that. Additionally, it is not a good guide for the binary output size, assuming my understanding of binary output is also correct (it might not be as I'm not a user of it). ._user_heap_stack is counted as data presumably because it is marked (incorrectly) as a PROGBITS section. That sounds like a bug in the linker to me, and might be the ultimate cause of the large object you're getting from llvm-objcopy.

I've now also tried to remove not only ._user_heap_stack but also .data2 and .bss2... still, no changes.

By this, do you mean removed from the linker script, from the input, or something else?

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago

Ok, checked the ELF again. Sorry about that NULL section thing. You were right in that this section is omnipresent and it's also present in an ELF generated by GCC.

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago

Running arm-none-eabi-size on the ELF gives me the following output: text data bss dec hex filename 824 3596 428 4848 12f0 A.elf

What I also find interesting in this regard is that the section ._user_heap_stack I've posted before seems to get counted to "data". The section is empty and to my knowledge only used to produce linker errors in case there isn't enough RAM available to alloc all static objects + minimum heap size + stack size. Generating an ELF with arm-none-eabi-gcc with the very same linker script does not count this section as "data".

GCC's ELF is also missing the NULL section at the very beginning, .ARM.exidx (which are turned off anyhow?) and only shows two LOADS for the program headers:

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x010000 0x08000000 0x08000000 0x00158 0x00158 RWE 0x10000 LOAD 0x000000 0x20000000 0x20000000 0x00000 0x01190 RW 0x10000

I've now also tried to remove not only ._user_heap_stack but also .data2 and .bss2... still, no changes.

jh7370 commented 4 years ago

Thanks for the updated ELF Vincent. I don't have any more time to look at this today unfortunately. Regarding the linker behaviour for NOBITS/PROGBITS, maybe Geroge Rimar, or Fangrui Song can assist. I'm assuming you're using LLD? They have more knowledge than I do in that area. Fangrui has also done some work on llvm-objcopy in the binary output area recently, so might see something I've missed or misunderstood.

A simple solution for .data2 is to put it before .data in the linker script. That should change its order in the section header table without impacting the address in this case (I'm assuming it's given a hard-coded address in the linker script).

I probably wasn't clear with my point 4, but I think that point indicates there isn't a bug in llvm-objcopy here. Unfortunately, I haven't got an ARM-supporting GNU objcopy to verify with. I assume you do, and if so, could you try using it on the output to see the result, and let me know what the size is then, please?

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago

Hello James

Thank you for your fast reply. Apparently I'm an idiot. I must have attached the wrong ELF file where I've experimented with some linker script changes. I've recompiled and reattached an ELF file where the dump matches the one I posted 3 days ago.

Now the ELF attached actually contains 24 sections and .bss2. is no longer present.

The ._user_heap_stack still is though but I don't really know why its of type PROGBITS. This section is marked as (NOLOAD) in my linker script like this:

._user_heap_stack (NOLOAD) : { . = ALIGN(8); PROVIDE ( end = . ); PROVIDE ( _end = . ); . = . + _Min_Heap_Size; . = . + _Min_Stack_Size; . = ALIGN(8); } >RAM

So was the .bss2 section in the first ELF file btw. Yet it also ended up as type PROGBITS?

I've also tried removing the ._user_heap_stack section from my linker script altogether. This also had no effect on the produced binary which was still 384MB large.

The address of .data2 indeed goes backwards, but sadly that address comes from my silicon vendor so there is no changing that.

6a5a5375-364f-40e7-a2e2-55be3cb4d36f commented 4 years ago

ELF (fixed)

jh7370 commented 4 years ago

Hi Vincent,

Just to let you know, the NULL section header is entirely normal. ELF requires there to be a single NULL section header at the start of the section header table, and it is occasionally used for special metadata.

Did you get the 824B if you run the command using GNU objcopy? I get the following output using GNU and LLVM size:

text data bss dec hex filename 868 3596 428 4892 131c A.elf

I also note however, that the ELF section header dump doesn't look quite the same as the one you posted:


There are 26 section headers, starting at offset 0x17544:

Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .vector_table PROGBITS 08000000 001000 000010 00 A 0 0 4 [ 2] .version PROGBITS 08000010 001010 000010 00 A 0 0 1 [ 3] .text PROGBITS 08000020 001020 000334 00 AX 0 0 4 [ 4] .rodata PROGBITS 08000354 001354 000000 00 AX 0 0 1 [ 5] .ARM.exidx ARM_EXIDX 08000354 001354 000010 00 AL 3 0 4 [ 6] .preinit_array PROGBITS 08000364 001364 000000 00 A 0 0 1 [ 7] .init_array INIT_ARRAY 08000364 001364 000004 04 WA 0 0 4 [ 8] .fini_array FINI_ARRAY 08000368 001368 000004 04 WA 0 0 4 [ 9] .data PROGBITS 20000000 002000 000000 00 WA 0 0 1 [10] .data2 PROGBITS 10000000 002000 000000 00 WA 0 0 1 [11] .bss NOBITS 20000000 002000 0001ac 00 WA 0 0 512 [12] .bss2 PROGBITS 200001ac 0021ac 000000 00 WA 0 0 1 [13] ._user_heap_stack PROGBITS 200001ac 0021ac 000e04 00 WA 0 0 1 [14] .ARM.attributes ARM_ATTRIBUTES 00000000 002fb0 000049 00 0 0 1 [15] .debug_str PROGBITS 00000000 002ff9 004795 01 MS 0 0 1 [16] .debug_loc PROGBITS 00000000 00778e 001bc1 00 0 0 1 [17] .debug_abbrev PROGBITS 00000000 00934f 000d4a 00 0 0 1 [18] .debug_info PROGBITS 00000000 00a099 00ad8d 00 0 0 1 [19] .debug_ranges PROGBITS 00000000 014e26 000148 00 0 0 1 [20] .comment PROGBITS 00000000 014f6e 00002a 01 MS 0 0 1 [21] .debug_frame PROGBITS 00000000 014f98 00065c 00 0 0 4 [22] .debug_line PROGBITS 00000000 0155f4 0012cf 00 0 0 1 [23] .symtab SYMTAB 00000000 0168c4 0006b0 10 25 60 4 [24] .shstrtab STRTAB 00000000 016f74 00010c 00 0 0 1 [25] .strtab STRTAB 00000000 017080 0004c3 00 0 0 1


I also took a look at the attached ELF, and I think it looks slightly odd to me. I suspect, though I don't know, there's something wrong with your assembly or possibly linker script. I dumped the program headers and here is the result:


There are 6 program headers, starting at offset 52

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x001000 0x08000000 0x08000000 0x00364 0x00364 R E 0x1000 LOAD 0x001364 0x08000364 0x08000364 0x00008 0x00008 RW 0x1000 LOAD 0x002000 0x20000000 0x1800036c 0x00fb0 0x00fb0 RW 0x1000 GNU_RELRO 0x001364 0x08000364 0x08000364 0x00008 0x00c9c R 0x1 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x0 EXIDX 0x001354 0x08000354 0x08000354 0x00010 0x00010 R 0x4

Section to Segment mapping: Segment Sections... 00 .vector_table .version .text .rodata .ARM.exidx 01 .preinit_array .init_array .fini_array 02 .data .bss .bss2 ._user_heap_stack 03 .preinit_array .init_array .fini_array 04 05 .rodata .ARM.exidx None .data2 .ARM.attributes .debug_str .debug_loc .debug_abbrev .debug_info .debug_ranges .comment .debug_frame .debug_line .symtab .shstrtab .strtab


Things I noticed from this and the section header dumps: 1) You appear to have a PROGBITS ._user_heap_stack section, following the .bss section. This will cause the .bss section to be allocated file space in the segment since the later section cannot be represented otherwise. 2) The .bss2 section in the attachment appears to be PROGBITS too, which suggests there you have created this section with the wrong flags. This may also be the mistake with ._user_heap_stack. 3) The address of .data2 goes backwards. This is probably harmless in itself, but might indicate another problem somewhere. 4) As far as a I know the file size of a binary output will be the difference between the start address of the first non-NOBITS allocatable section (in this case .vector_table) and the end address of the last one (in this case ._user_heap_stack). This gives a size value required of 384MB.

wuwbobo2021 commented 3 months ago

this might help: https://community.st.com/t5/stm32-mcus-products/bin-file-generated-by-gcc-too-large/m-p/444719 https://community.st.com/t5/stm32-mcus-products/bin-file-generated-by-gcc-too-large/td-p/444719