the usage of `copy2Slice` here seems to be incorrect, PutInt64 should be used instead

fumeboy commented 1 year ago

https://github.com/pkujhd/goloader/blob/master/relocate.go#LL138C5-L138C5

copy2Slice used to copy the 8 bytes from *addr to *addr+7 into the TEXT segment, but the correct approach should be to copy the uint64 value addr itself as [8]byte.

pkujhd commented 1 year ago

@fumeboy , MOV instructor want to mov addr's content to register, so copy data to register, not addr to register.

pkujhd commented 1 year ago

If you encounter a bug，give me a testcase please. thx

fumeboy commented 1 year ago

I couldnt submit a PR on my work computer, but I can provide you with a code snippet.

first, write a Go file as our test module. this file will have a PCREL relocation pointing to a global variable.

package a

var val = 1

func Main() (output int) {
    return val
}

then change the bool condition here to always be true ( if true { ). run it and you can see a panic. https://github.com/eh-steve/goloader/blob/WIP---Update-github-actions-to-run-tests/relocate.go#L155

and fix it like this :

func (linker *Linker) relocatePCREL(addr uintptr, loc obj.Reloc, segment *segment, relocByte []byte, addrBase int) (err error) {
    byteorder := linker.Arch.ByteOrder

    if true { // offset > 0x7FFFFF || offset < -0x800000
        bytes := relocByte[loc.Offset-2:]
        modrm0 := relocByte[loc.Offset-1]
        opcode := relocByte[loc.Offset-2]
        if opcode == x86amd64MOVcode && loc.Size >= Uint32Size {
            // jump GOT
            copy(bytes, []byte{0xE9, 0x00, 0x00, 0x00, 0x00, 0x90})  // 0xE9 == JMP; 0x90 == NOP

            offset := (segment.codeBase + segment.codeOff) - (addrBase + loc.Offset + loc.Size - 1) // -1 because len([0xE9]), not len([0xE9, 0x00])
            byteorder.PutUint32(relocByte[loc.Offset-1:], uint32(offset))
        } else {
            return fmt.Errorf("not support code:%v!\n", relocByte[loc.Offset-2:loc.Offset])
        }

        {
            var MOV_A1 = []byte{
                0x48, 0xA1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // MOVQ  *[64 bit address] =>  RAX
            }

            if modrm := (modrm0 & 0b00111000) | 0b11000000; modrm == 0b11000000 { // if mov RAX => RAX
                // MOVQ  *[64 bit address] =>  RAX
                copy(segment.codeByte[segment.codeOff:], MOV_A1) 
                byteorder.PutUint64(segment.codeByte[segment.codeOff+2:], uint64(addr)+uint64(loc.Add))
                segment.codeOff += len(MOV_A1)
            } else { // else mov RAX => other REG
                // push RAX
                segment.codeByte[segment.codeOff] = 0x50 
                segment.codeOff++

                // mov *[64 bit address] =>  RAX
                copy(segment.codeByte[segment.codeOff:], MOV_A1) 
                byteorder.PutUint64(segment.codeByte[segment.codeOff+2:], uint64(addr)+uint64(loc.Add))
                segment.codeOff += len(MOV_A1)

                // MOV RAX => REG
                var MOV_8B = []byte{0x48, 0x8B, 0x00} 
                copy(segment.codeByte[segment.codeOff:], MOV_8B)
                segment.codeByte[segment.codeOff+len(MOV_8B)-1] = modrm
                segment.codeOff += len(MOV_8B)

                // pop RAX
                segment.codeByte[segment.codeOff] = 0x58 
                segment.codeOff++
            }

            // JMP back
            copy(segment.codeByte[segment.codeOff:], []byte{0xff, 0x25, 0x00, 0x00, 0x00, 0x00})
            segment.codeOff += 6
            putAddressAddOffset(byteorder, segment.codeByte, &segment.codeOff, uint64(addrBase+loc.Offset+loc.Size))
        }
    }
    return err
}

eh-steve commented 1 year ago

On my fork there is indeed some faulty arithmetic in the jump addresses in these PCREL relocs - hopefully working on a proper fix soon. @pkujhd is correct though in that the value at the address is what we need to copy for a 0x8B MOV...

pkujhd commented 1 year ago

On my fork there is indeed some faulty arithmetic in the jump addresses in these PCREL relocs - hopefully working on a proper fix soon. @pkujhd is correct though in that the value at the address is what we need to copy for a 0x8B MOV...

@eh-steve maybe faulty arithmetic is use jmp short/near instruction, the offset doesn't add current instruction offset, after current jmp short/near instrunction readed, the pc register is modified

fumeboy commented 1 year ago

@eh-steve i dont understand why copy the value at the address is right. because the value of variable could be change at runtime, if we copy with the imm, we only got the initial value.

so we should not use B8 to just mov an imm, should use A1 to mov the value from mem to RAX, then use 8B to mov the value from RAX to target register.

pkujhd commented 1 year ago

@eh-steve i dont understand why copy the value at the address is right. because the value of variable could be change at runtime, if we copy, we only got the initial value.

There may be the problem you mentioned, but the data dealed by the linker is stable data in the runtime， e.g most statements using movQ are used to read os.Stdout.

Best way is like CMPL instruction，use multiply instructions to implement MOVQ.

eh-steve commented 1 year ago

@fumeboy There is indeed a problem with the current far address PCREL asm, both in terms of the JMP arithmetic, and the static pointer indirection that used to happen at link time instead of runtime. This would be ok for static values in addresses (e.g. itabs) but not for global variables as your example points out.

I've added a test case for this issue which the upstream currently fails, and completely rewritten the x86 PCREL implementation in this commit to correct both problems: https://github.com/eh-steve/goloader/pull/5/commits/87f559aca3b26419ca2a8ae9e968ac63969e2bef

I've also added x86 R_GOTPCREL/R_TLS_IE support into the linker (arm64 still TODO), and enabled -dynlink by default in the JIT package, which avoids most (but not all) of these PCREL relocs.

Because not all registers are capable of using the A1 MOVABS RAX, [add64] instruction, I've gone with 2 different approaches, one for when the destination register is RAX, and one when it's another arbitrary register (push RAX, read immediate into RAX, then indirect the value into the destination register, then pop RAX). Woudn't your suggestion of A1 first then 8B indirect the address twice - did you mean 89 instead?

If you want to give it a try on the tip of that branch (disable -dynlink in mergeBuildFlags(), and edit relocatePCREL() to have if offset > 0x7FFFFFFF || offset < -0x80000000 || loc.EpilogueSize > 0 {), feel free, and thanks again for your careful attention!

pkujhd commented 1 year ago

@eh-steve @fumeboy ok, I will migrate the bug fix to my repos. thanks

fumeboy commented 1 year ago

I've also added x86 R_GOTPCREL/R_TLS_IE support into the linker (arm64 still TODO), and enabled -dynlink by default in the JIT package, which avoids most (but not all) of these PCREL relocs.

thanks, and why not all ?

eh-steve commented 1 year ago

thanks, and why not all ?

Off the top of my head, const strings, float literals, stack tmps, CGo wrappers etc will still use PCRELs not GOT relocs (on the expectation that these symbols would be laid out relatively local to the text).

Because the host binary might mmap CGo shared library symbols at some far addresses, we still need to support these relocation epilogues

pkujhd commented 1 year ago

@eh-steve， I wirte some test code, I found some other things, if a global vaiable is not in current package but it is used in loader, use this golobal varable lead to far address problem,
Almost all instructions need to be processed, not just MOVQ/JMPL it is a big project.

`package main

import ( "fmt"

"github.com/pppa"

)

var a = 1

func main() { a = pppa.Val + 1 a = pppa.Val - 1 pppa.Test() fmt.Println(pppa.Val) } ` The above code has been generated INCQ/SUBQ/MOVQ(89)

fumeboy commented 1 year ago

I think some problems not necessary to be solved. a safe design should not allow user modules to "write" to global variables outside of their own package. If access to specific global variables is required, it should be specifically injected by the linker like this:

package user_module

var will_be_inject *int // inject the value of `pppa.Val` at linktime

func main() {
  a := *will_be_inject + 1
  a = *will_be_inject - 1
  fmt.Println(*will_be_inject)
}

however, this depends on the scene which we want to use goloader.

eh-steve commented 1 year ago

@pkujhd I don't think this is a problem in my fork due to both -dynlink avoiding complex PCRELs and mmap_manager guaranteeing a maximum distance. It can build huge packages (http/k8s/gonum) with lots of globals with dependencies from hundreds of packages.

Basic test case added: https://github.com/eh-steve/goloader/pull/5/commits/a1c74096a93bb2f0734cdf048152cddb31703581

The other reason mmap_manager is required is because R_METHODOFF gets written into an int32 textOff for uncommon and itab type's ifn/tfn values in methods, so needs to fit inside 32 bits (rewriting asm doesn't help as the runtime resolves these directly using (*_type).textOff(mhdr.ifn)). R_METHODOFFs would end up split across module boundaries for a type defined in the firstmodule, whose methods were unreachable from the firstmodule (and therefore deadcode eliminated by the Go linker's reachability analysis), if a JIT package then attempts to use those methods. TestIssue55 demonstrates this problem/solution

eh-steve commented 1 year ago

I think some problems not necessary to be solved. a safe design should not allow user modules to "write" to global variables outside of their own package. If access to specific global variables is required, it should be specifically injected by the linker...

I disagree with this as a goal. I think we should aim to make goloader behave as intuitively to the developer as possible, and that means expecting JIT code to behave the same as if it were compiled into the host binary. I think the way to achieve this is via careful symbol deduplication across the host binary and JIT code for packages that are partially included (+itab patching).

This is one of the reasons I added the tests to prove the ability to build and run a very large Go project like k8s via JIT

fumeboy commented 1 year ago

perhaps we can add a patch to the compiler to make it generate GOT_PCREL for the symbols we specify. it seems not very difficult.

https://github.com/golang/go/blob/b100e127ca0e398fbb58d04d04e2443b50b3063e/src/cmd/internal/obj/x86/obj6.go#L314

eh-steve commented 1 year ago

perhaps we can add a patch to the compiler to make it generate GOT_PCREL for the symbols we specify. it seems not very difficult.

Ideally we wouldn't force the use of GOT for local symbols (e.g stmps/consts) because it will be less efficient code. Also this doesn't address the problem for native CGo code which we can't patch as easily.

Like I said before, I don't think this is a problem (on my fork) so we don't need to do anything, and use of -dynlink alone can't remove the need for mmap manager due to how itab method offsets work in Go

fumeboy commented 1 year ago

@eh-steve

Why does textOff cause problems?

If R_MethodOff points to firstmodule, the off = firstmodule.text - target_addr If R_MethodOff points to localmodule, theoff = local.text - target_addr

Even if we want to access the method of the first module from the local module, according to the source code, there doesn't seem to be any problem, right? because the textOff function will use the corresponding text segment address to calculate the absolute address.

func (t *_type) textOff(off textOff) unsafe.Pointer {
    if off == -1 {
        // -1 is the sentinel value for unreachable code.
        // See cmd/link/internal/ld/data.go:relocsym.
        return unsafe.Pointer(abi.FuncPCABIInternal(unreachableMethod))
    }
    base := uintptr(unsafe.Pointer(t))
    var md *moduledata
    for next := &firstmoduledata; next != nil; next = next.next {
        if base >= next.types && base < next.etypes {
            md = next
            break
        }
    }
        // ...
        res := md.textAddr(uint32(off))

if DCE will remove the unreachable, why not use type register?

eh-steve commented 1 year ago

Why does textOff cause problems?

The case you described is fine (and straightforward).

The problem occurs with types which have unreachable methods.

The base text address is resolved using the module the type is defined in.

If a type is defined in the firstmodule, but it's method is deadcode eliminated as it's unreachable from the host binary, but then the JIT module wants to use this type and method, it has to use the firstmodule type, but the JIT version of the method text which is in the JIT module. This means the type and text are in different modules (which is fine, but the offset between them can only be maximum 32 bits).

stdlib's plugin avoids this problem by disabling deadcode elimination for methods entirely as soon as the plugin package is used anywhere (which bloats the host binary).

My approach was to keep types in the firstmodule, but patch their method text offsets to point at text in the JIT module

eh-steve commented 1 year ago

if DCE will remove the unreachable, why not use type register?

I don't understand what you mean here

fumeboy commented 1 year ago

if DCE will remove the unreachable, why not use type register?

I don't understand what you mean here

https://github.com/pkujhd/goloader/blob/master/examples/loader/loader.go#L69

use type register to keep symbol not be deadcode eliminated

eh-steve commented 1 year ago

Yeah you can achieve the same just by doing

_ = reflect.ValueOf(&sync.WaitGroup{})

But that requires you to know in advance (in the host binary) everything that might ever be compiled in JIT, which for my use case was impossible (since the JIT code is user-configured). That's why in my fork, this type registration isn't needed at all

eh-steve commented 1 year ago

@fumeboy Just to let you know I'm now more happy with the implementation of x86 PCREL relocs in https://github.com/eh-steve/goloader/pull/5, and that branch fixes a bunch of other bugs. There's still an issue I'd like to fix before merging, and I'd also like to see all 60 (x104) test variations (across OS/arch/+-CGo/+-dynlink) pass if possible

eh-steve commented 1 year ago

@pkujhd You copied the wrong version of x86amd64replaceCALLcode and x86amd64replaceCMPLcode because RAX is used to pass the first arg of a function in the new regabi, and CMP needs to indirect RAX ([RAX] using 0x38 not 0xf8). Please make sure you're copying from my latest branch.

eh-steve commented 1 year ago

Also you're not pointer aligning the epilogues on arm64 when you're adding a 20 byte maxExtraCodeSize_ADDRARM64 epilogue. This can cause instruction alignment faults on arm64

eh-steve commented 1 year ago

And you're not including the high bit of the destination register from the REX.W prefix in your MOV PCREL rewrite

eh-steve commented 1 year ago

There's a bunch more mistakes I think - please look at my master branch and copy from there

pkujhd commented 1 year ago

There's a bunch more mistakes I think - please look at my master branch and copy from there

Thank you for calling attention to my error. I will fix it

pkujhd commented 1 year ago

@pkujhd You copied the wrong version of x86amd64replaceCALLcode and x86amd64replaceCMPLcode because RAX is used to pass the first arg of a function in the new regabi, and CMP needs to indirect RAX ([RAX] using 0x38 not 0xf8). Please make sure you're copying from my latest branch.

in x86amd64replaceCMPLcode, i mov RAX [RAX], so CMP RAX x, your code has fewer instructions, it is better.

x86amd64replaceCALLcode may be rewrite like R_CALL, not replace CALL instruction with JMPN.

eh-steve commented 1 year ago

x86amd64replaceCALLcode may be rewrite like R_CALL, not replace CALL instruction with JMPN.

That's a good idea, running tests here https://github.com/eh-steve/goloader/pull/14

pkujhd / goloader

the usage of `copy2Slice` here seems to be incorrect, PutInt64 should be used instead #78