Closed fumeboy closed 1 year ago
@fumeboy , MOV instructor want to mov addr's content to register, so copy data to register, not addr to register.
If you encounter a bug,give me a testcase please. thx
I couldnt submit a PR on my work computer, but I can provide you with a code snippet.
first, write a Go file as our test module. this file will have a PCREL relocation pointing to a global variable.
package a
var val = 1
func Main() (output int) {
return val
}
then change the bool condition here to always be true ( if true {
). run it and you can see a panic.
https://github.com/eh-steve/goloader/blob/WIP---Update-github-actions-to-run-tests/relocate.go#L155
and fix it like this :
func (linker *Linker) relocatePCREL(addr uintptr, loc obj.Reloc, segment *segment, relocByte []byte, addrBase int) (err error) {
byteorder := linker.Arch.ByteOrder
if true { // offset > 0x7FFFFF || offset < -0x800000
bytes := relocByte[loc.Offset-2:]
modrm0 := relocByte[loc.Offset-1]
opcode := relocByte[loc.Offset-2]
if opcode == x86amd64MOVcode && loc.Size >= Uint32Size {
// jump GOT
copy(bytes, []byte{0xE9, 0x00, 0x00, 0x00, 0x00, 0x90}) // 0xE9 == JMP; 0x90 == NOP
offset := (segment.codeBase + segment.codeOff) - (addrBase + loc.Offset + loc.Size - 1) // -1 because len([0xE9]), not len([0xE9, 0x00])
byteorder.PutUint32(relocByte[loc.Offset-1:], uint32(offset))
} else {
return fmt.Errorf("not support code:%v!\n", relocByte[loc.Offset-2:loc.Offset])
}
{
var MOV_A1 = []byte{
0x48, 0xA1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // MOVQ *[64 bit address] => RAX
}
if modrm := (modrm0 & 0b00111000) | 0b11000000; modrm == 0b11000000 { // if mov RAX => RAX
// MOVQ *[64 bit address] => RAX
copy(segment.codeByte[segment.codeOff:], MOV_A1)
byteorder.PutUint64(segment.codeByte[segment.codeOff+2:], uint64(addr)+uint64(loc.Add))
segment.codeOff += len(MOV_A1)
} else { // else mov RAX => other REG
// push RAX
segment.codeByte[segment.codeOff] = 0x50
segment.codeOff++
// mov *[64 bit address] => RAX
copy(segment.codeByte[segment.codeOff:], MOV_A1)
byteorder.PutUint64(segment.codeByte[segment.codeOff+2:], uint64(addr)+uint64(loc.Add))
segment.codeOff += len(MOV_A1)
// MOV RAX => REG
var MOV_8B = []byte{0x48, 0x8B, 0x00}
copy(segment.codeByte[segment.codeOff:], MOV_8B)
segment.codeByte[segment.codeOff+len(MOV_8B)-1] = modrm
segment.codeOff += len(MOV_8B)
// pop RAX
segment.codeByte[segment.codeOff] = 0x58
segment.codeOff++
}
// JMP back
copy(segment.codeByte[segment.codeOff:], []byte{0xff, 0x25, 0x00, 0x00, 0x00, 0x00})
segment.codeOff += 6
putAddressAddOffset(byteorder, segment.codeByte, &segment.codeOff, uint64(addrBase+loc.Offset+loc.Size))
}
}
return err
}
On my fork there is indeed some faulty arithmetic in the jump addresses in these PCREL relocs - hopefully working on a proper fix soon. @pkujhd is correct though in that the value at the address is what we need to copy for a 0x8B MOV...
On my fork there is indeed some faulty arithmetic in the jump addresses in these PCREL relocs - hopefully working on a proper fix soon. @pkujhd is correct though in that the value at the address is what we need to copy for a 0x8B MOV...
@eh-steve maybe faulty arithmetic is use jmp short/near instruction, the offset doesn't add current instruction offset, after current jmp short/near instrunction readed, the pc register is modified
@eh-steve i dont understand why copy the value at the address
is right. because the value of variable could be change at runtime, if we copy with the imm
, we only got the initial value.
so we should not use B8 to just mov an imm
, should use A1 to mov the value from mem to RAX, then use 8B to mov the value from RAX to target register.
@eh-steve i dont understand why copy
the value at the address
is right. because the value of variable could be change at runtime, if we copy, we only got the initial value.
There may be the problem you mentioned, but the data dealed by the linker is stable data in the runtime, e.g most statements using movQ are used to read os.Stdout.
Best way is like CMPL instruction,use multiply instructions to implement MOVQ.
@fumeboy There is indeed a problem with the current far address PCREL asm, both in terms of the JMP arithmetic, and the static pointer indirection that used to happen at link time instead of runtime. This would be ok for static values in addresses (e.g. itabs) but not for global variables as your example points out.
I've added a test case for this issue which the upstream currently fails, and completely rewritten the x86 PCREL implementation in this commit to correct both problems: https://github.com/eh-steve/goloader/pull/5/commits/87f559aca3b26419ca2a8ae9e968ac63969e2bef
I've also added x86 R_GOTPCREL
/R_TLS_IE
support into the linker (arm64
still TODO), and enabled -dynlink
by default in the JIT package, which avoids most (but not all) of these PCREL relocs.
Because not all registers are capable of using the A1 MOVABS RAX, [add64]
instruction, I've gone with 2 different approaches, one for when the destination register is RAX, and one when it's another arbitrary register (push RAX, read immediate into RAX, then indirect the value into the destination register, then pop RAX). Woudn't your suggestion of A1 first then 8B indirect the address twice - did you mean 89 instead?
If you want to give it a try on the tip of that branch (disable -dynlink
in mergeBuildFlags()
, and edit relocatePCREL()
to have if offset > 0x7FFFFFFF || offset < -0x80000000 || loc.EpilogueSize > 0 {
), feel free, and thanks again for your careful attention!
@eh-steve @fumeboy ok, I will migrate the bug fix to my repos. thanks
I've also added x86
R_GOTPCREL
/R_TLS_IE
support into the linker (arm64
still TODO), and enabled-dynlink
by default in the JIT package, which avoids most (but not all) of these PCREL relocs.
thanks, and why not all
?
thanks, and why
not all
?
Off the top of my head, const strings, float literals, stack tmps, CGo wrappers etc will still use PCRELs not GOT relocs (on the expectation that these symbols would be laid out relatively local to the text).
Because the host binary might mmap CGo shared library symbols at some far addresses, we still need to support these relocation epilogues
@eh-steve, I wirte some test code, I found some other things, if a global vaiable is not in current package but it is used in loader, use this golobal varable lead to far address problem,
Almost all instructions need to be processed, not just MOVQ/JMPL
it is a big project.
`package main
import ( "fmt"
"github.com/pppa"
)
var a = 1
func main() { a = pppa.Val + 1 a = pppa.Val - 1 pppa.Test() fmt.Println(pppa.Val) } ` The above code has been generated INCQ/SUBQ/MOVQ(89)
I think some problems not necessary to be solved. a safe design should not allow user modules to "write" to global variables outside of their own package. If access to specific global variables is required, it should be specifically injected by the linker like this:
package user_module
var will_be_inject *int // inject the value of `pppa.Val` at linktime
func main() {
a := *will_be_inject + 1
a = *will_be_inject - 1
fmt.Println(*will_be_inject)
}
however, this depends on the scene which we want to use goloader.
@pkujhd I don't think this is a problem in my fork due to both -dynlink avoiding complex PCRELs and mmap_manager guaranteeing a maximum distance. It can build huge packages (http/k8s/gonum) with lots of globals with dependencies from hundreds of packages.
Basic test case added: https://github.com/eh-steve/goloader/pull/5/commits/a1c74096a93bb2f0734cdf048152cddb31703581
The other reason mmap_manager is required is because R_METHODOFF
gets written into an int32 textOff
for uncommon
and itab
type's ifn
/tfn
values in methods, so needs to fit inside 32 bits (rewriting asm doesn't help as the runtime resolves these directly using (*_type).textOff(mhdr.ifn)
). R_METHODOFF
s would end up split across module boundaries for a type defined in the firstmodule, whose methods were unreachable from the firstmodule (and therefore deadcode eliminated by the Go linker's reachability analysis), if a JIT package then attempts to use those methods. TestIssue55
demonstrates this problem/solution
I think some problems not necessary to be solved. a safe design should not allow user modules to "write" to global variables outside of their own package. If access to specific global variables is required, it should be specifically injected by the linker...
I disagree with this as a goal. I think we should aim to make goloader behave as intuitively to the developer as possible, and that means expecting JIT code to behave the same as if it were compiled into the host binary. I think the way to achieve this is via careful symbol deduplication across the host binary and JIT code for packages that are partially included (+itab patching).
This is one of the reasons I added the tests to prove the ability to build and run a very large Go project like k8s via JIT
perhaps we can add a patch to the compiler to make it generate GOT_PCREL for the symbols we specify. it seems not very difficult.
perhaps we can add a patch to the compiler to make it generate GOT_PCREL for the symbols we specify. it seems not very difficult.
Ideally we wouldn't force the use of GOT for local symbols (e.g stmps/consts) because it will be less efficient code. Also this doesn't address the problem for native CGo code which we can't patch as easily.
Like I said before, I don't think this is a problem (on my fork) so we don't need to do anything, and use of -dynlink alone can't remove the need for mmap manager due to how itab method offsets work in Go
@eh-steve
Why does textOff cause problems?
If R_MethodOff
points to firstmodule, the off = firstmodule.text - target_addr
If R_MethodOff
points to localmodule, theoff = local.text - target_addr
Even if we want to access the method of the first module from the local module, according to the source code, there doesn't seem to be any problem, right? because the textOff
function will use the corresponding text segment address to calculate the absolute address.
func (t *_type) textOff(off textOff) unsafe.Pointer {
if off == -1 {
// -1 is the sentinel value for unreachable code.
// See cmd/link/internal/ld/data.go:relocsym.
return unsafe.Pointer(abi.FuncPCABIInternal(unreachableMethod))
}
base := uintptr(unsafe.Pointer(t))
var md *moduledata
for next := &firstmoduledata; next != nil; next = next.next {
if base >= next.types && base < next.etypes {
md = next
break
}
}
// ...
res := md.textAddr(uint32(off))
if DCE will remove the unreachable, why not use type register
?
Why does textOff cause problems?
The case you described is fine (and straightforward).
The problem occurs with types which have unreachable methods.
The base text address is resolved using the module the type is defined in.
If a type is defined in the firstmodule, but it's method is deadcode eliminated as it's unreachable from the host binary, but then the JIT module wants to use this type and method, it has to use the firstmodule type, but the JIT version of the method text which is in the JIT module. This means the type and text are in different modules (which is fine, but the offset between them can only be maximum 32 bits).
stdlib's plugin
avoids this problem by disabling deadcode elimination for methods entirely as soon as the plugin package is used anywhere (which bloats the host binary).
My approach was to keep types in the firstmodule, but patch their method text offsets to point at text in the JIT module
if DCE will remove the unreachable, why not use type register?
I don't understand what you mean here
if DCE will remove the unreachable, why not use type register?
I don't understand what you mean here
https://github.com/pkujhd/goloader/blob/master/examples/loader/loader.go#L69
use type register
to keep symbol not be deadcode eliminated
Yeah you can achieve the same just by doing
_ = reflect.ValueOf(&sync.WaitGroup{})
But that requires you to know in advance (in the host binary) everything that might ever be compiled in JIT, which for my use case was impossible (since the JIT code is user-configured). That's why in my fork, this type registration isn't needed at all
@fumeboy Just to let you know I'm now more happy with the implementation of x86 PCREL relocs in https://github.com/eh-steve/goloader/pull/5, and that branch fixes a bunch of other bugs. There's still an issue I'd like to fix before merging, and I'd also like to see all 60 (x104) test variations (across OS/arch/+-CGo/+-dynlink) pass if possible
@pkujhd You copied the wrong version of x86amd64replaceCALLcode
and x86amd64replaceCMPLcode
because RAX is used to pass the first arg of a function in the new regabi, and CMP
needs to indirect RAX ([RAX]
using 0x38
not 0xf8
). Please make sure you're copying from my latest branch.
Also you're not pointer aligning the epilogues on arm64 when you're adding a 20 byte maxExtraCodeSize_ADDRARM64
epilogue. This can cause instruction alignment faults on arm64
And you're not including the high bit of the destination register from the REX.W prefix in your MOV PCREL rewrite
There's a bunch more mistakes I think - please look at my master branch and copy from there
There's a bunch more mistakes I think - please look at my master branch and copy from there
Thank you for calling attention to my error. I will fix it
@pkujhd You copied the wrong version of
x86amd64replaceCALLcode
andx86amd64replaceCMPLcode
because RAX is used to pass the first arg of a function in the new regabi, andCMP
needs to indirect RAX ([RAX]
using0x38
not0xf8
). Please make sure you're copying from my latest branch.
in x86amd64replaceCMPLcode, i mov RAX [RAX], so CMP RAX x, your code has fewer instructions, it is better.
x86amd64replaceCALLcode may be rewrite like R_CALL, not replace CALL instruction with JMPN.
x86amd64replaceCALLcode may be rewrite like R_CALL, not replace CALL instruction with JMPN.
That's a good idea, running tests here https://github.com/eh-steve/goloader/pull/14
https://github.com/pkujhd/goloader/blob/master/relocate.go#LL138C5-L138C5
copy2Slice
used to copy the 8 bytes from*addr
to*addr+7
into the TEXT segment, but the correct approach should be to copy the uint64 valueaddr
itself as [8]byte.