ldc-developers / ldc

The LLVM-based D Compiler.
http://wiki.dlang.org/LDC
Other
1.21k stars 261 forks source link

Attribute to force inline methods #3214

Open WebFreak001 opened 4 years ago

WebFreak001 commented 4 years ago

I'm currently using LDC to generate LLVM IR files which I then generate to AVR compatible .o files and link using avr-gcc (calling avr-ld)

Now my problem is that I heavily use templated structs without fields with all methods static and force inline for syntactic sugar, but this generates around 3000 lines of useless LLVM IR in my project right now which blows up the resulting ELF file by over 2KB (increasing flash time by several seconds and leaving me with a lot less flash memory)

For reference my code:

My templated struct for transparent volatile pointers ```d // all this code does is translating this C define to D: // #define MEM(addr) (*((volatile T*)addr)) // so you can do // #define MEMX MEM(0x20) // and then use it using // MEMX = 4; // MEMX |= 8; /// Helper struct to automatically call volatileStore and volatileLoad on assignment/reading of pointers private template VolatileRef(T, alias addr) { private struct VolatileRef { alias get this; pragma(inline, true): // TODO: the following functions still make it into the resulting HEX file, even though they are unused if always inlined // they should be removed from the hex file somehow, but they don't get stripped or removed by -Wl,--gc-sections static T* ptr() { return cast(T*)(addr); } static T get() { return volatileLoad(ptr); } static void opAssign(T value) { volatileStore(ptr, value); } static auto opOpAssign(string op)(T value) { T ret; mixin("volatileStore(ptr, ret = cast(T)(volatileLoad(ptr) " ~ op ~ " value));"); return ret; } } } // enum MEM(alias addr) = VolatileRef!(ubyte, cast(ubyte*)addr); // there are like 200 of these defines in my code: // enum MEMX = MEM!0x20; // MEMX = 6; ```

It would be great if it was possible to make LDC somehow maybe not even emit the functions as LLVM IR but instead always inline the IR code directly so it works cross module before optimization as well. (because of #3126)

Is there maybe an existing way (using LLVM) to strip out the unused methods? I am using --fvisibility=hidden, so they are actually marked as hidden in the .ll files but they are still being built and still exist in the final ELF file. I also tried --internalize but that made the main function non-accessible or renamed it to something which the linker couldn't find. (the .ll file still was huge though)

Otherwise it would be great to have a ldc.attributes.forceInline or something similar. Or maybe this would be fixed along with #2968 ?

Otherwise LDC is working great with embedded development using AVR! :)

kinke commented 4 years ago

An 8-bit controller target? Interesting. :) - What triple do you use to generate the .ll files? I guess we'd only need to enable the AVR target for our LLVM to support direct .o emission and linking via -gcc=avr-gcc.

Have you checked whether the functions are emitted into separate sections in the object file? E.g., llvm-readelf --sections myobject.o. That's a prerequisite for ld's stripping via --gc-sections. I don't know how you generate the .o files from the .ll files, but as it's probably an LLVM tool, there might be a -function-sections (and -data-sections) command-line option for that.

WebFreak001 commented 4 years ago

I've tried 2 ways to build it right now, one of it already used -function-sections and -data-sections:

// using .o files from ldc
// build with -output-o into obj/ folder first
avr-gcc -mmcu=atmega1284p -Wall -Wl,"--gc-sections" obj/*.o -o project.elf
avr-objcopy -O ihex -R .eeprom project.elf project.hex

This way I get the following elf sections for the module which contains the templated struct, causing the bloat:

separate .o files generated by LDC ``` Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 00000000 002178 006337 00 0 0 1 // ^ no idea what this all is [ 2] .text PROGBITS 00000000 000034 000000 00 AX 0 0 4 [ 3] .progmem.data PROGBITS 00000000 000034 000962 00 A 0 0 2 // ^ all the unused functions [ 4] .linker-options LLVM_LINKER_OPTIONS 00000000 000996 000000 00 E 0 0 1 [ 5] .symtab SYMTAB 00000000 000998 0017e0 10 1 2 4 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), l (large) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) ``` and a properly looking `app.o` file: ``` Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 00000000 000124 0000f4 00 0 0 1 [ 2] .text PROGBITS 00000000 000034 000000 00 AX 0 0 4 [ 3] .progmem.data PROGBITS 00000000 000034 00003a 00 A 0 0 2 // ^ reasonable size! [ 4] .rela.progmem.data RELA 00000000 000100 000024 0c 6 3 4 [ 5] .linker-options LLVM_LINKER_OPTIONS 00000000 00006e 000000 00 E 0 0 1 [ 6] .symtab SYMTAB 00000000 000070 000090 10 1 4 4 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), l (large) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) ``` and the resulting .elf file: ``` Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .data PROGBITS 00800100 000ac2 000000 00 WA 0 0 1 [ 2] .text PROGBITS 00000000 000054 000a6e 00 AX 0 0 2 // ^ way too big // v everything below here doesn't make it into the hex file anyway [ 3] .note.gnu.avr.deviceinfo NOTE 00000000 000ac4 000040 00 0 0 4 [ 4] .debug_info PROGBITS 00000000 000b04 000792 00 0 0 1 [ 5] .debug_abbrev PROGBITS 00000000 001296 000729 00 0 0 1 [ 6] .debug_line PROGBITS 00000000 0019bf 00001d 00 0 0 1 [ 7] .debug_str PROGBITS 00000000 0019dc 000296 00 0 0 1 [ 8] .symtab SYMTAB 00000000 001c74 001ce0 10 9 16 4 [ 9] .strtab STRTAB 00000000 003954 0066b1 00 0 0 1 [10] .shstrtab STRTAB 00000000 00a005 000071 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), l (large) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) ```

// using .ll files from ldc
// build with -output-ll into obj/ folder first

llvm-link -S -o project.ll obj/*.ll

opt -S -Oz --data-sections --function-sections --inline --strip-dead-prototypes --strip-dead-debug-info --strip project.ll --march=avr --mcpu=atmega1284p -o project.opt.ll" `

llc project.opt.ll --data-sections --function-sections --march=avr --mcpu=atmega1284p -filetype=obj -O2 -o project.o

avr-gcc -fdata-sections -mmcu=atmega1284p -Wall -Wl,"--gc-sections" project.o -o project.elf
avr-objcopy -O ihex -R .eeprom project.elf project.hex

This way I get the following elf sections: (of the combined project.o file here)

Combined .o file from the .ll files ``` Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 00000000 0021d8 006373 00 0 0 1 // ^ not sure what this is [ 2] .text PROGBITS 00000000 000034 000000 00 AX 0 0 4 [ 3] .progmem.data PROGBITS 00000000 000034 00096a 00 A 0 0 2 // ^ overly large program (should only be like at most 20 words) [ 4] .rela.progmem.data RELA 00000000 0021cc 00000c 0c 8 3 4 [ 5] .group GROUP 00000000 0009a4 000008 04 8 4 4 [ 6] .rodata._D3avr4fuse8__fuse_t6__initZ PROGBITS 00000000 00099e 000003 00 AG 0 0 1 // ^ currently unused struct init which I didn't expect to even be put into the executable [ 7] .linker-options LLVM_LINKER_OPTIONS 00000000 0009a1 000000 00 E 0 0 1 [ 8] .symtab SYMTAB 00000000 0009ac 001820 10 1 4 4 ``` and the .elf file: ``` Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .data PROGBITS 00800100 000a90 000000 00 WA 0 0 1 [ 2] .text PROGBITS 00000000 000054 000a3c 00 AX 0 0 2 // ^ too big, but slightly less than other method (maybe better optimization) // v everything below here doesn't make it into the hex file anyway [ 3] .note.gnu.avr.deviceinfo NOTE 00000000 000a90 000040 00 0 0 4 [ 4] .debug_info PROGBITS 00000000 000ad0 000792 00 0 0 1 [ 5] .debug_abbrev PROGBITS 00000000 001262 000729 00 0 0 1 [ 6] .debug_line PROGBITS 00000000 00198b 00001d 00 0 0 1 [ 7] .debug_str PROGBITS 00000000 0019a8 000296 00 0 0 1 [ 8] .symtab SYMTAB 00000000 001c40 001cd0 10 9 15 4 [ 9] .strtab STRTAB 00000000 003910 0066aa 00 0 0 1 [10] .shstrtab STRTAB 00000000 009fba 000071 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), l (large) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) ```

The full code is also at https://github.com/WebFreak001/avrd (requires a patched dub with the PR being open right now)

When reading the disassembly you can first see the good executable and then all the dead code though:

Disassembly + LLVM IR + optimized LLVM IR ```asm Disassembly of section .text: 00000000 <__vectors>: 0: 0c 94 fb 04 jmp 0x9f6 ; 0x9f6 <__ctors_end> 4: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 8: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 10: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 14: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 18: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 1c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 20: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 24: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 28: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 2c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 30: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 34: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 38: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 3c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 40: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 44: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 48: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 4c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 50: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 54: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 58: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 5c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 60: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 64: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 68: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 6c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 70: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 74: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 78: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 7c: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 80: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 84: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 88: 0c 94 1a 05 jmp 0xa34 ; 0xa34 <__bad_interrupt> 0000008c
: 8c: 8f ef ldi r24, 0xFF ; 255 8e: 84 b9 out 0x04, r24 ; 4 00000090 : 90: 85 b9 out 0x05, r24 ; 5 92: fe cf rjmp .-4 ; 0x90 ... around 1300 similar lines before this 00000968 <_D3avr8sfr_defs__T11VolatileRefTtVPti204ZQx__T3ptrZQfMFNaNbNiNfZQBe>: 968: 8c ec ldi r24, 0xCC ; 204 96a: 90 e0 ldi r25, 0x00 ; 0 96c: 08 95 ret 0000096e <_D3avr8sfr_defs__T11VolatileRefTtVPti204ZQx__T8opAssignZQkMFNbNiNftZv>: 96e: 70 93 cd 00 sts 0x00CD, r23 ; 0x8000cd <__TEXT_REGION_LENGTH__+0x7e00cd> 972: 60 93 cc 00 sts 0x00CC, r22 ; 0x8000cc <__TEXT_REGION_LENGTH__+0x7e00cc> 976: 08 95 ret 00000978 <_D3avr8sfr_defs__T11VolatileRefThVPhi204ZQx__T3getZQfMFNbNiNfZh>: 978: 80 91 cc 00 lds r24, 0x00CC ; 0x8000cc <__TEXT_REGION_LENGTH__+0x7e00cc> 97c: 99 27 eor r25, r25 97e: 08 95 ret 00000980 <_D3avr8sfr_defs__T11VolatileRefThVPhi204ZQx__T3ptrZQfMFNaNbNiNfZQBe>: 980: 8c ec ldi r24, 0xCC ; 204 982: 90 e0 ldi r25, 0x00 ; 0 984: 08 95 ret 00000986 <_D3avr8sfr_defs__T11VolatileRefThVPhi204ZQx__T8opAssignZQkMFNbNiNfhZv>: 986: 60 93 cc 00 sts 0x00CC, r22 ; 0x8000cc <__TEXT_REGION_LENGTH__+0x7e00cc> 98a: 08 95 ret 0000098c <_D3avr8sfr_defs__T11VolatileRefThVPhi205ZQx__T3getZQfMFNbNiNfZh>: 98c: 80 91 cd 00 lds r24, 0x00CD ; 0x8000cd <__TEXT_REGION_LENGTH__+0x7e00cd> 990: 99 27 eor r25, r25 992: 08 95 ret 00000994 <_D3avr8sfr_defs__T11VolatileRefThVPhi205ZQx__T3ptrZQfMFNaNbNiNfZQBe>: 994: 8d ec ldi r24, 0xCD ; 205 996: 90 e0 ldi r25, 0x00 ; 0 998: 08 95 ret 0000099a <_D3avr8sfr_defs__T11VolatileRefThVPhi205ZQx__T8opAssignZQkMFNbNiNfhZv>: 99a: 60 93 cd 00 sts 0x00CD, r22 ; 0x8000cd <__TEXT_REGION_LENGTH__+0x7e00cd> 99e: 08 95 ret 000009a0 <_D3avr8sfr_defs__T11VolatileRefThVPhi206ZQx__T3getZQfMFNbNiNfZh>: 9a0: 80 91 ce 00 lds r24, 0x00CE ; 0x8000ce <__TEXT_REGION_LENGTH__+0x7e00ce> 9a4: 99 27 eor r25, r25 9a6: 08 95 ret 000009a8 <_D3avr8sfr_defs__T11VolatileRefThVPhi206ZQx__T3ptrZQfMFNaNbNiNfZQBe>: 9a8: 8e ec ldi r24, 0xCE ; 206 9aa: 90 e0 ldi r25, 0x00 ; 0 9ac: 08 95 ret 000009ae <_D3avr8sfr_defs__T11VolatileRefThVPhi206ZQx__T8opAssignZQkMFNbNiNfhZv>: 9ae: 60 93 ce 00 sts 0x00CE, r22 ; 0x8000ce <__TEXT_REGION_LENGTH__+0x7e00ce> 9b2: 08 95 ret 000009b4 <_D3avr8sfr_defs__T11VolatileRefThVPhi93ZQw__T3getZQfMFNbNiNfZh>: 9b4: 8d b7 in r24, 0x3d ; 61 9b6: 99 27 eor r25, r25 9b8: 08 95 ret 000009ba <_D3avr8sfr_defs__T11VolatileRefThVPhi93ZQw__T3ptrZQfMFNaNbNiNfZQBd>: 9ba: 8d e5 ldi r24, 0x5D ; 93 9bc: 90 e0 ldi r25, 0x00 ; 0 9be: 08 95 ret 000009c0 <_D3avr8sfr_defs__T11VolatileRefThVPhi93ZQw__T8opAssignZQkMFNbNiNfhZv>: 9c0: 6d bf out 0x3d, r22 ; 61 9c2: 08 95 ret 000009c4 <_D3avr8sfr_defs__T11VolatileRefTtVPti93ZQw__T3getZQfMFNbNiNfZt>: 9c4: 8d b7 in r24, 0x3d ; 61 9c6: 9e b7 in r25, 0x3e ; 62 ... around 100 lines after this 000009f6 <__ctors_end>: 9f6: 11 24 eor r1, r1 9f8: 1f be out 0x3f, r1 ; 63 9fa: cf ef ldi r28, 0xFF ; 255 9fc: d0 e4 ldi r29, 0x40 ; 64 9fe: de bf out 0x3e, r29 ; 62 a00: cd bf out 0x3d, r28 ; 61 00000a02 <__do_copy_data>: a02: 11 e0 ldi r17, 0x01 ; 1 a04: a0 e0 ldi r26, 0x00 ; 0 a06: b1 e0 ldi r27, 0x01 ; 1 a08: ec e3 ldi r30, 0x3C ; 60 a0a: fa e0 ldi r31, 0x0A ; 10 a0c: 00 e0 ldi r16, 0x00 ; 0 a0e: 0b bf out 0x3b, r16 ; 59 a10: 02 c0 rjmp .+4 ; 0xa16 <__do_copy_data+0x14> a12: 07 90 elpm r0, Z+ a14: 0d 92 st X+, r0 a16: a0 30 cpi r26, 0x00 ; 0 a18: b1 07 cpc r27, r17 a1a: d9 f7 brne .-10 ; 0xa12 <__do_copy_data+0x10> 00000a1c <__do_clear_bss>: a1c: 21 e0 ldi r18, 0x01 ; 1 a1e: a0 e0 ldi r26, 0x00 ; 0 a20: b1 e0 ldi r27, 0x01 ; 1 a22: 01 c0 rjmp .+2 ; 0xa26 <.do_clear_bss_start> 00000a24 <.do_clear_bss_loop>: a24: 1d 92 st X+, r1 00000a26 <.do_clear_bss_start>: a26: a0 30 cpi r26, 0x00 ; 0 a28: b2 07 cpc r27, r18 a2a: e1 f7 brne .-8 ; 0xa24 <.do_clear_bss_loop> a2c: 0e 94 46 00 call 0x8c ; 0x8c
a30: 0c 94 1c 05 jmp 0xa38 ; 0xa38 <_exit> 00000a34 <__bad_interrupt>: a34: 0c 94 00 00 jmp 0 ; 0x0 <__vectors> 00000a38 <_exit>: a38: f8 94 cli 00000a3a <__stop_program>: a3a: ff cf rjmp .-2 ; 0xa3a <__stop_program> ``` The __ctors_end code and following also seems to be in C elf binaries and doesn't break anything (I think it default initializes some registers and memory) With all the garbage inlined functions it's 2622 bytes of flash memory now (instead of 176 bytes of flash memory) Here is also what the linked together unoptimized .ll file looks like: ```llvm ; ModuleID = 'llvm-link' source_filename = "llvm-link" target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8" target triple = "avr" %avr.fuse.__fuse_t = type { i8, i8, i8 } %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef" = type { [1 x i8] } $main = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3getZQfMFNbNiNfZh = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3ptrZQfMFNaNbNiNfZQBd = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T8opAssignZQkMFNbNiNfhZv = comdat any ... around 700 lines of comdat $_D3avr8sfr_defs__T11VolatileRefThVPhi95ZQw__T3ptrZQfMFNaNbNiNfZQBd = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi95ZQw__T8opAssignZQkMFNbNiNfhZv = comdat any $_D3avr4fuse8__fuse_t6__initZ = comdat any @_D3avr4fuse8__fuse_t6__initZ = hidden local_unnamed_addr constant %avr.fuse.__fuse_t { i8 1, i8 1, i8 1 }, comdat, align 1 ; Function Attrs: noreturn define hidden i32 @main() local_unnamed_addr addrspace(1) #0 comdat { %.structliteral = alloca %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef", align 1 %.structliteral1 = alloca %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef", align 1 %1 = getelementptr inbounds %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef", %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* %.structliteral, i16 0, i32 0, i16 0 store i8 0, i8* %1, align 1 call addrspace(1) void @_D3avr8sfr_defs__T11VolatileRefThVPhi36ZQw__T8opAssignZQkMFNbNiNfhZv(%"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* nonnull %.structliteral, i8 zeroext -1) #1 %2 = getelementptr inbounds %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef", %"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* %.structliteral1, i16 0, i32 0, i16 0 br label %forcond forcond: ; preds = %forcond, %0 store i8 0, i8* %2, align 1 call addrspace(1) void @_D3avr8sfr_defs__T11VolatileRefThVPhi37ZQw__T8opAssignZQkMFNbNiNfhZv(%"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* nonnull %.structliteral1, i8 zeroext 64) #1 br label %forcond } ; Function Attrs: alwaysinline define weak_odr hidden zeroext i8 @_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3getZQfMFNbNiNfZh(%"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* nonnull %.this_arg) local_unnamed_addr addrspace(1) #1 comdat { %1 = load volatile i8, i8* inttoptr (i16 32 to i8*), align 32 ret i8 %1 } ; Function Attrs: alwaysinline define weak_odr hidden i8* @_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3ptrZQfMFNaNbNiNfZQBd(%"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* nonnull %.this_arg) local_unnamed_addr addrspace(1) #1 comdat { ret i8* inttoptr (i16 32 to i8*) } ; Function Attrs: alwaysinline define weak_odr hidden void @_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T8opAssignZQkMFNbNiNfhZv(%"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* nonnull %.this_arg, i8 zeroext %value_arg) local_unnamed_addr addrspace(1) #1 comdat { store volatile i8 %value_arg, i8* inttoptr (i16 32 to i8*), align 32 ret void } ; Function Attrs: alwaysinline define weak_odr hidden zeroext i8 @_D3avr8sfr_defs__T11VolatileRefThVPhi33ZQw__T3getZQfMFNbNiNfZh(%"avr.sfr_defs.VolatileRef!(ubyte, cast(ubyte*)36u).VolatileRef"* nonnull %.this_arg) local_unnamed_addr addrspace(1) #1 comdat { %1 = load volatile i8, i8* inttoptr (i16 33 to i8*), align 1 ret i8 %1 } ... around 2000 lines of those attributes #0 = { noreturn "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "target-cpu"="atmega1284p" "unsafe-fp-math"="false" } attributes #1 = { alwaysinline "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "target-cpu"="atmega1284p" "unsafe-fp-math"="false" } !llvm.linker.options = !{} !llvm.ident = !{!0, !0, !0, !0, !0, !0, !0, !0, !0, !0, !0, !0} !0 = !{!"ldc version 1.18.0-git-2ff7d49"} ``` and how it looks after optimizing: ```llvm ; ModuleID = '.dub/blinking.ll' source_filename = "llvm-link" target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8" target triple = "avr" %0 = type { i8, i8, i8 } %1 = type { [1 x i8] } $main = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3getZQfMFNbNiNfZh = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3ptrZQfMFNaNbNiNfZQBd = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T8opAssignZQkMFNbNiNfhZv = comdat any $_D3avr8sfr_defs__T11VolatileRefThVPhi33ZQw__T3getZQfMFNbNiNfZh = comdat any ... around 700 lines of comdat $_D3avr4fuse8__fuse_t6__initZ = comdat any @_D3avr4fuse8__fuse_t6__initZ = hidden local_unnamed_addr constant %0 { i8 1, i8 1, i8 1 }, comdat, align 1 ; Function Attrs: nofree norecurse noreturn nounwind define hidden i32 @main() local_unnamed_addr addrspace(1) #0 comdat { store volatile i8 -1, i8* inttoptr (i16 36 to i8*), align 4 br label %1 1: ; preds = %1, %0 store volatile i8 64, i8* inttoptr (i16 37 to i8*), align 1 br label %1 } ; Function Attrs: alwaysinline define weak_odr hidden zeroext i8 @_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3getZQfMFNbNiNfZh(%1* nonnull) local_unnamed_addr addrspace(1) #1 comdat { %2 = load volatile i8, i8* inttoptr (i16 32 to i8*), align 32 ret i8 %2 } ; Function Attrs: alwaysinline define weak_odr hidden i8* @_D3avr8sfr_defs__T11VolatileRefThVPhi32ZQw__T3ptrZQfMFNaNbNiNfZQBd(%1* nonnull) local_unnamed_addr addrspace(1) #1 comdat { ret i8* inttoptr (i16 32 to i8*) } ... around 2000 lines of those attributes #0 = { nofree norecurse noreturn nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "target-cpu"="atmega1284p" "unsafe-fp-math"="false" } attributes #1 = { alwaysinline "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "target-cpu"="atmega1284p" "unsafe-fp-math"="false" } !llvm.linker.options = !{} !llvm.ident = !{!0, !0, !0, !0, !0, !0, !0, !0, !0, !0, !0, !0} !0 = !{!"ldc version 1.18.0-git-2ff7d49"} ``` so the main in the optimized llvm ir also looks great, but it still compiles and links all the unused inlined files.

So I don't know what's causing them to be stuck in the binary or why nothing seems to have an effect.

kinke commented 4 years ago

As expected, you seem not to end up with separate sections. This:

void foo() {}
void bar() {}

yields something like this with ldc2 -c -mtriple=x86_64-pc-linux-gnu current.d && llvm-readelf --sections current.o:

  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 0003d8 000155 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
  [ 3] .group            GROUP           0000000000000000 000168 000008 04     24   8  4
  [ 4] .text._D7current3fooFZv PROGBITS  0000000000000000 000040 000006 00 AXG  0   0 16
  [ 5] .group            GROUP           0000000000000000 000170 000008 04     24   7  4
  [ 6] .text._D7current3barFZv PROGBITS  0000000000000000 000050 000006 00 AXG  0   0 16
  [ 7] .text.ldc.register_dso PROGBITS   0000000000000000 000060 000043 00  AX  0   0 16
  [ 8] .rela.text.ldc.register_dso RELA  0000000000000000 0002e8 000060 18     24   7  8
  [ 9] .group            GROUP           0000000000000000 000178 000008 04     24   6  4
  [10] .data._D7current12__ModuleInfoZ PROGBITS 0000000000000000 0000a8 000010 00 WAG  0   0  8
  [11] __minfo           PROGBITS        0000000000000000 0000b8 000008 00  WA  0   0  8
  [12] .rela__minfo      RELA            0000000000000000 000348 000018 18     24  11  8
  [13] .bss.ldc.dso_slot NOBITS          0000000000000000 0000c0 000008 00  WA  0   0  8
  [14] .group            GROUP           0000000000000000 000180 000014 04     24  13  4
  [15] .init_array       INIT_ARRAY      0000000000000000 0000c0 000008 00 WAG  0   0  8
  [16] .rela.init_array  RELA            0000000000000000 000360 000018 18   G 24  15  8
  [17] .fini_array       FINI_ARRAY      0000000000000000 0000c8 000008 00 WAG  0   0  8
  [18] .rela.fini_array  RELA            0000000000000000 000378 000018 18   G 24  17  8
  [19] .linker-options   LLVM_LINKER_OPTIONS 0000000000000000 0000d0 000000 00   E  0   0  1
  [20] .comment          PROGBITS        0000000000000000 0000d0 000026 01  MS  0   0  1
  [21] .note.GNU-stack   PROGBITS        0000000000000000 0000f6 000000 00      0   0  1
  [22] .eh_frame         X86_64_UNWIND   0000000000000000 0000f8 000070 00   A  0   0  8
  [23] .rela.eh_frame    RELA            0000000000000000 000390 000048 18     24  22  8
  [24] .symtab           SYMTAB          0000000000000000 000198 000150 18      1   5  8

Notice the various .text.* sections (which the linker will merge into a single final .text section in the ELF binary). Your object files feature a 0-sized .text section, and a .progmem.data section at the same offset which seems to be the real deal, but it's all one 'big' section, so the linker cannot strip anything.

kinke commented 4 years ago

In case this is a bug or limitation of the AVR LLVM backend, you can also play around with custom sections:

import ldc.attributes;

@section(".progmem.data." ~ foo.mangleof)
void foo() {}
// => emitted into object file section `.progmem.data._D7current3fooFZv`
WebFreak001 commented 4 years ago

thank you! The section trick did wonders and completely eliminated all junk.

This does seem a little bit like a hack though, isn't there a way to do this on LLVM level already instead of only removing it at the linker? Otherwise I would close this now as this seems to work exactly like I need it to

kinke commented 4 years ago

Inlining without emitting the inlined functions at all in IR would most likely entail something as ugly as DMD's approach, inlining at the AST level, and that's not likely going to happen.

JohanEngelen commented 4 years ago

Functions that are always inlined can be marked with available_externally linkage in LLVM IR, such that they are not emitted in the object file. It is tricky: for example, if you take the address of such a function, then you'll get a linker error. Currently I don't think we have any means for the user to explicitly apply the available_externally linkage type to functions.

WebFreak001 commented 3 years ago

bump, wanted to make a small library to test if x is typeof(x).init (because I have both long type and long variable names which I wanted to cut down on), but it bloats the executable with lots of isDefault functions

kinke commented 3 years ago

Try using a function literal like this:

pragma(inline, true)
alias isDefault = (auto ref x) => x is typeof(x).init;
WebFreak001 commented 3 years ago

awesome, that works, even as operator overload!