Closed pruzko closed 4 years ago
We could add
// stop vm execution, thread-safe
vm.interrupt()
// returns true if the vm is running
vm.is_running()
so you would be able to do
# main thread
while vm.is_running():
vm.exec()
# and in another thread
vm.interrupt()
I'd need to modify state::exec, but I agree interrupting vm execution is currently missing and we need it in many scenarios
Untested branch: https://github.com/thalium/icebox/tree/vm_interrupt
Thank you for a swift reply. The vm.interrupt
is a good idea.
I have installed and tested the vm_interrupt
branch. src/icebox/icebox_py/__init__.py
is missing the following export:
def interrupt(self):
"""Interrupt vm"""
_icebox.interrupt()
I will add it in a PR.
However, the interrupt concept does not work yet. Consider the following example (that will be part of the PR as well):
import time
import threading
import icebox
class Worker(threading.Thread):
def __init__(self):
super().__init__()
self.vm = None
self.quit = False
def run(self):
try:
self.vm = icebox.attach('Win10')
proc = self.vm.processes.find_name('csrss.exe')
addr = proc.symbols.address('ntdll!DbgPrint')
phys_addr = proc.memory.physical_address(addr)
with self.vm.break_on_physical(phys_addr, self.callback):
print('Wroker: Start')
while not self.quit:
print('About to exec')
self.vm.exec()
print('Worker: Done')
finally:
print('finally block')
self.vm.resume()
def stop(self):
print('stopping')
self.quit = True
self.vm.interrupt()
def callback(self):
print('OK')
worker = Worker()
print('Master: Starting worker')
worker.start() // blocks the whole process for some reason
for i in range(5):
print(f'Sleeping {i}')
time.sleep(1)
worker.stop()
worker.join()
print('Master: Done')
The interrupt indeed breaks the vm.exec()
operation. The problem is that vm.exec()
blocks both of the threads. I don't know how that's possible, but the for i in range(5)
loop does not execute until you hit a break-point. A single iteration is executed once you hit a break-point, so if you trigger it 5 times the vm.interrupt()
gets executed and everything works as expected.
It seems that it's the try_wait function that is toublesome but I can't find the cause. I was able to figure out, that the following part of try_wait
is blocking the other thread:
try_wait:
while(!d.interrupted)
{
std::this_thread::yield(); // yielding does not seem to affect the problem, just saves CPU
const auto ok = fdp::state_changed(d.core);
if(!ok)
continue; // this gets executed
...
fdp::state_changed:
const auto ret = FDP_GetStateChanged(core.shm_->ptr);
if(!ret)
return false;
...
FDP_GetStateChanged:
???
The FDP_GetStateChanged
uses some spin-locks to protect some shared memory, but I have no idea why that would block the whole process.
Please, do you have any idea where to look next?
Thanks for the detailed bug report ! I'll try to look at it, but the issue is probably around the python bindings where a native extension is blocking. Maybe there is something that can be done to release the python interpreter
I've pushed a tentative fix, where I release the python thread before waiting. It should explain your bug, but I am not able to check it until later next week
The current solution throws SIGSEGV, because the constructor of struct Handle
accesses a reference to core
, while you construct it with a shared pointer pointing to null.
I got it running as follows.
binding.cpp
1) Add null check in the constructor of Handle
Handle(const std::shared_ptr<core::Core>& c)
{
core = c;
if (c)
{
state::on_blocking_call(*c, [=](state::blocking_e blocking)
{
if(blocking == state::blocking_e::begin)
thread_state = PyEval_SaveThread();
else
PyEval_RestoreThread(thread_state);
});
}
}
2) replace member initialization lists with a call to constructors
new(handle) Handle{core} --> new(handle) Handle(core); // 2 places
// replacing new(handle) Handle{{}}
auto core = std::shared_ptr<core::Core>(nullptr);
new(handle) Handle(core);
state.cpp
false
in try_wait
you'll get a run time exception RuntimeError: error
if(d.interrupted)
return true;
It works pretty well, except for one bug that is a little out of my reach I guess. Breakpoints are removed while the VM is running if you leave the break-point context after interrupt.
with self.vm.break_on_physical(addr, cb):
self.vm.exec() // gets interrupted
The warning: INFO fdp: fdp::unset_breakpoint called on is_running vm
Yes, it was an untested quick&dirty patch, I will fix it properly soon. About breakpoint warnings, we probably need to pause the vm when we interrupt it so we can remove bps properly
Please try the latest vm_interrupt branch. It's still untested, so the try_wait fix may not work (but it will hopefully ^^)
You may still get weird behaviors inside python during breakpoint handling (def callback in your example), if you do not enable WinPE mode during windows boot and need to use page faults. I will fix it too eventually
It works well, thanks a lot!
Suppose we have an example inspired by the getting started post.
The objective is to asynchronously stop the
while True
loop. This can be achieved using processes or threads.A thread solution could look like this:
The solution does not work, because the
wait_for
function is not exported to python and looking at the source codes it is apparent, that the function is going to block until a break point is hit (which I can't assume).The solution with processes almost works:
The worker process can receive
SIGTERM
in multiple scenarios. If it was received while waiting for break point, then the VM is not blocked and everything works well. However, if the signal was received during execution of a break point call-back function then the VM is blocked and needs to be resumed (seecleanup_routine
).The problem with this solution is that when I override a handler for
SIGTERM
, the signal is never delivered. I also tried sending (various) signals withkill
command but no luck. If the handler is not overridden, then the library prints some logs and terminates (as expected).The objective here is to stop the introspection and ensure the VM is running. The signals/threads/processes are only some failed attempts and maybe they are not necessary at all. Please, what should I do?
PS.: I use some synchronization primitives to ensure the worker entered the
while True
loop before the master sends a signal. Also, a solution where the master waits until the worker's call-back function has finished is acceptable.