llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.25k stars 11.66k forks source link

[ELF] - FreeBSD lld linked loader hangs (sys/boot/efi/boot1). #30561

Closed llvmbot closed 7 years ago

llvmbot commented 7 years ago
Bugzilla Link 31213
Resolution WONTFIX
Resolved on Dec 01, 2016 08:53
Version unspecified
OS FreeBSD
Blocks llvm/llvm-project#23588
Reporter LLVM Bugzilla Contributor
CC @emaste,@rui314

Extended Description

sys/boot/efi/boot1 loader hangs in QEMU if linked with lld.

I was able to reduce its code to minimal case:

include <sys/param.h>

include <machine/elf.h>

include <machine/stdarg.h>

include

include

include

include "boot_module.h"

EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab);

EFI_SYSTEM_TABLE systab; EFI_BOOT_SERVICES bs; static EFI_HANDLE *image;

EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE Xsystab) { SIMPLE_TEXT_OUTPUT_INTERFACE conout = NULL; systab = Xsystab; image = Ximage; bs = Xsystab->BootServices;

//conout = Xsystab->ConOut; //1 conout = systab->ConOut; //2

conout->Reset(conout, TRUE); return EFI_SUCCESS; }

Now note that if line 1 is used, lld linked loader hangs. If line 2 is used, it does not. ld.bfd linked loader feels fine with any of these lines.

I am investigating the difference in output now.

emaste commented 7 years ago

The failure was introduced by r288107. I will follow up with details to the mailing list or a separate PR.

That's now PR 31221

llvmbot commented 7 years ago

Something is still not quite right and linking the FreeBSD bootloaders with that change does not work for me, although George reported success.

Yes, minimal scenario from this PR as well as building+linking complete source of usr/src/sys/boot/efi/boot1 loader is fixed for me with https://reviews.freebsd.org/D8681 applied.

Tested in a next way: root@freebsd:/usr/src/sys/boot/efi/boot1 make clean make objcopy \ -j .peheader -j .text -j .sdata -j .data \ -j .dynamic -j .dynsym -j .rel.dyn \ -j .rela.dyn -j .reloc -j .eh_frame \ --output-target=efi-app-x86_64 boot1.sym.full loader.efi

now loader.efi is workable.

Though I still observe issue with objcopy call from original makefile: objcopy --only-keep-debug boot1.sym.full boot1.sym.debug BFD: boot1.sym.debug: section .dynsym' can't be allocated in segment 4 objcopy: boot1.sym.debug: Bad value BFD: boot1.sym.debug: section.dynsym' can't be allocated in segment 4 objcopy: boot1.sym.debug: Bad value *** Error code 1 That issue can be fixed with passing -N to linker and after that result boot1.efi also works for me.

Ah, yes. That is an oddity of ld.bfd I had noticed before. If we can fix the freebsd code instead of implementing the same oddity in ld.lld that would be awesome.

Yep, I also noticed that earlier. I just do not think it is an oddity only of bfd, I think gold also do that since loader linked from source still has many left relocations, what means not all of them were relaxed and looks it "worked" because of the same reasons. (though I did not re-check) That is why I at first I supposed we might want to do the same, but did not find anything in ABI or somewhere else about storing addends in relocation targets.

I am closing this, since this one issue does not requires fixing on LLD side and we can open separate PR for others if any.

emaste commented 7 years ago

/libexec/ld-elf.so.1 is broken by a change between r287996 and r288228. I'm bisecting now.

The failure was introduced by r288107. I will follow up with details to the mailing list or a separate PR.

emaste commented 7 years ago

Ah, yes. That is an oddity of ld.bfd I had noticed before. If we can fix the freebsd code instead of implementing the same oddity in ld.lld that would be awesome.

Indeed. It's definitely a bug for FreeBSD to parse rela relocations as if they were rel. It would be hard to justify implementing the oddity in LLD, I think.

https://reviews.freebsd.org/D8681 fixes the FreeBSD self reloc code.

Something is still not quite right and linking the FreeBSD bootloaders with that change does not work for me, although George reported success. In my LLD-linked FreeBSD world+kernel test I'm seeing nearly all userland applications segfaulting, so I'm worried something has regressed in LLD in the last day or two. I'm rebuilding everything now to investigate further.

:-(

/libexec/ld-elf.so.1 is broken by a change between r287996 and r288228. I'm bisecting now.

llvmbot commented 7 years ago

So, bfd has the same two R_X86_64_RELATIVE but instead of using r_addend it writes the addend to the got entry? That is odd.

It's not instead of, it's both.

ld.bfd writes the same value in r_addend and at the relocation target in the got entry. Then if the runtime loader (self-reloc code in the bootloader case) incorrectly processes the relocation entries as if they were rel relocations, it "works." The FreeBSD EFI bootloader has a bug and relies on this GNU ld quirk right now; this is addressed by the FreeBSD code review linked above.

Ah, yes. That is an oddity of ld.bfd I had noticed before. If we can fix the freebsd code instead of implementing the same oddity in ld.lld that would be awesome.

Something is still not quite right and linking the FreeBSD bootloaders with that change does not work for me, although George reported success. In my LLD-linked FreeBSD world+kernel test I'm seeing nearly all userland applications segfaulting, so I'm worried something has regressed in LLD in the last day or two. I'm rebuilding everything now to investigate further.

:-(

emaste commented 7 years ago

So, bfd has the same two R_X86_64_RELATIVE but instead of using r_addend it writes the addend to the got entry? That is odd.

It's not instead of, it's both.

ld.bfd writes the same value in r_addend and at the relocation target in the got entry. Then if the runtime loader (self-reloc code in the bootloader case) incorrectly processes the relocation entries as if they were rel relocations, it "works." The FreeBSD EFI bootloader has a bug and relies on this GNU ld quirk right now; this is addressed by the FreeBSD code review linked above.

Something is still not quite right and linking the FreeBSD bootloaders with that change does not work for me, although George reported success. In my LLD-linked FreeBSD world+kernel test I'm seeing nearly all userland applications segfaulting, so I'm worried something has regressed in LLD in the last day or two. I'm rebuilding everything now to investigate further.

llvmbot commented 7 years ago

Looks nothing should be fixed on lld side. Though I wonder why ld has such behavior. I think we can close it then ?

It will be good to confirm that it indeed works with FreeBSD D8681 applied, when linked with LLD.

I don't know why GNU ld would have this odd behaviour, but I can see no reason we'd want to have LLD do this.

So, bfd has the same two R_X86_64_RELATIVE but instead of using r_addend it writes the addend to the got entry? That is odd.

llvmbot commented 7 years ago

Looks something with .got. It placed to .sdata by script. Out .got has 2 zero values, bfd has 2 non-zero values. gold does not generate .got because converts mov->lea.

If I edit slightly our code to match the gold, issue dissapears: if (Type ! = R_X86_64_GOTPCREL && // New condition. Type != R_X86_64_GOTPCRELX && Type != R_X86_64_REX_GOTPCRELX) return RelExpr;

Not relaxing R_X86_64_GOTPCREL is probably intentional, but this is probably not the root cause as you point out in the following messages.

emaste commented 7 years ago

Looks nothing should be fixed on lld side. Though I wonder why ld has such behavior. I think we can close it then ?

It will be good to confirm that it indeed works with FreeBSD D8681 applied, when linked with LLD.

I don't know why GNU ld would have this odd behaviour, but I can see no reason we'd want to have LLD do this.

llvmbot commented 7 years ago

So FreeBSD guys going to fix that on their side: https://reviews.freebsd.org/D8681

Looks nothing should be fixed on lld side. Though I wonder why ld has such behavior. I think we can close it then ?

llvmbot commented 7 years ago

Ok, I think now the reason is pretty clear for me.

Loader has next relocations: Relocation section '.rela.dyn' at offset 0x7000 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000004000 000000000008 R_X86_64_RELATIVE 0000000000003010 000000004008 000000000008 R_X86_64_RELATIVE 0000000000003008

0x4000 is the .sdata, where .got lives: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 5] .sdata PROGBITS 0000000000004000 00005000 0000000000000010 0000000000000000 WA 0 0 8

Loader contains code for self relocations: Elf_Word relsz, relent; Elf_Addr newaddr; ElfW_Rel rel; ElfW_Dyn *dynp;

/*
 * Find the relocation address, its size and the relocation entry.
 */
relsz = 0;
relent = 0;
for (dynp = dynamic; dynp->d_tag != DT_NULL; dynp++) {
    switch (dynp->d_tag) {
    case DT_REL:
    case DT_RELA:
        rel = (ElfW_Rel *)(dynp->d_un.d_ptr + baseaddr);
        break;
    case DT_RELSZ:
    case DT_RELASZ:
        relsz = dynp->d_un.d_val;
        break;
    case DT_RELENT:
    case DT_RELAENT:
        relent = dynp->d_un.d_val;
        break;
    default:
        break;
    }
}

/*
 * Perform the actual relocation.
 */
for (; relsz > 0; relsz -= relent) {
    switch (ELFW_R_TYPE(rel->r_info)) {
    case RELOC_TYPE_NONE:
        /* No relocation needs be performed. */
        break;

    case RELOC_TYPE_RELATIVE:
        newaddr = (Elf_Addr *)(rel->r_offset + baseaddr);

ifdef ELF_RELA

        /*
         * For R_AARCH64_RELATIVE we need to calculate the
         * delta between the address we are run from and the
         * address we are linked at. As the latter is 0 we
         * just use the address we are run from for this.
         */
        *newaddr = baseaddr + rel->r_addend;

else

        /* Address relative to the base address. */
        *newaddr += baseaddr;

endif

        break;
    default:
        /* XXX: do we need other relocations ? */
        break;
    }
    rel = (ElfW_Rel *)(void *)((caddr_t) rel + relent);
}

Notice that if ELF_RELA is not defined, it uses *newaddr += baseaddr; And ELF_RELA is not defined for amd64:

if defined(aarch64)

define ElfW_Rel Elf64_Rela

define ElfW_Dyn Elf64_Dyn

define ELFW_R_TYPE ELF64_R_TYPE

define ELF_RELA

elif defined(arm) || defined(i386)

define ElfW_Rel Elf32_Rel

define ElfW_Dyn Elf32_Dyn

define ELFW_R_TYPE ELF32_R_TYPE

elif defined(amd64)

define ElfW_Rel Elf64_Rel

define ElfW_Dyn Elf64_Dyn

define ELFW_R_TYPE ELF64_R_TYPE

else

error architecture not supported

endif

if defined(aarch64)

define RELOC_TYPE_NONE R_AARCH64_NONE

define RELOC_TYPE_RELATIVE R_AARCH64_RELATIVE

elif defined(amd64)

define RELOC_TYPE_NONE R_X86_64_NONE

define RELOC_TYPE_RELATIVE R_X86_64_RELATIVE

elif defined(arm)

define RELOC_TYPE_NONE R_ARM_NONE

define RELOC_TYPE_RELATIVE R_ARM_RELATIVE

elif defined(i386)

define RELOC_TYPE_NONE R_386_NONE

define RELOC_TYPE_RELATIVE R_386_RELATIVE

endif

So it uses the value written in the got instead of relocation addend. We have zero values in got, what is different from bfd. That why issue happens it seems.

llvmbot commented 7 years ago

Looks something with .got. It placed to .sdata by script. Out .got has 2 zero values, bfd has 2 non-zero values. gold does not generate .got because converts mov->lea.

If I edit slightly our code to match the gold, issue dissapears: if (Type ! = R_X86_64_GOTPCREL && // New condition. Type != R_X86_64_GOTPCRELX && Type != R_X86_64_REX_GOTPCRELX) return RelExpr;

llvmbot commented 7 years ago

Can you attach the cpio of both cases?

Submitted.

I was incorrect here: conout = Xsystab->ConOut; //1 conout = systab->ConOut; //2

If line 1 is used, then it works, otherwise - not.

llvmbot commented 7 years ago

noworkable

llvmbot commented 7 years ago

workable

llvmbot commented 7 years ago

Can you attach the cpio of both cases?