Closed wangbj closed 5 years ago
where
syscall_site+0x1
is the instruction after the breakpoint instruction.
If the syscall is always one byte, how does this generate an illegal instruction?
The binary instrumentation performed by the tracer could be happening on a different core, right? I guess ideally, after any instrumentation, the guest process would wake up in a state where it executes cpuid
and then jumps to the desired PC...
@chamibuddhika - do you have a theory here?
Syscall insn is two bytes in x86, bkpt is one byte
I am not quite sure what's happening here. Do we know which thread encounter the SIGILL? IIRC ptrace could be setup to halt a given thread only. But I guess it's not the mode of operation here.
sometimes it happens because we modified the old 2-byte syscall
instruction to:
1) callq trampoline (step1) 2) patch 1) to
.byte 0xcc // bkpt
rest_of_callq // partial instruction
3) rewind pc (pc -= 2), and let bkpt
instruction hit
4) restore the partial instruction in step 2)
5) rewind pc (pc -= 1), remove bkpkt
, and continue;
SIGILL sometimes happens at step 5), the proposed (merged) change:
1) callq trampoline 2) rewind pc (pc -=2)
I don't think the old method is entirely necessary and it over complicates things. Thus merged the change.
Ah, yes if there is a two byte syscall, then certainly syscall+1 is an anvilid PC.
Close this issue for now will reopen when this problem reproduce
When patching tracee's syscall site, we did a dummy synchronize function
synchronize_from
, it did:0xcc
) at the originalsyscall
instructionsinglestep
and let the breakpoint hitsyscall
instructionHowever, it causes
SIGILL
withPC=syscall_site+0x1
, wheresyscall_site+0x1
is the instruction after the breakpoint instruction.After the
synchronize_from
hack is removed, we didn't see anySIGILL
after more than 100 runs.In the future, if the
SIGILL
issue still afloat, we could changePC
to a completely different location, such as 0x7000_xxxx, where we still have a lot of space space, the instruction sequence could be:Then change
PC
back to where we're from. This should force the tracee do synchronization.synchronize_from
function is killed per commit: 2a87870ff79a8f43146708155db4a8699dec6069