Open mnznndr97 opened 11 months ago
If you kill any drakvuf/injector process with SIGTERM you should not get that error when you try to launch again. If you kill -9 then you would get that error. You can also specify a timeout for injector after which it would exit cleanly in case the injection didn't go through in time.
DRAKVUF-Sandbox does subprocess.run()
with a timeout (https://github.com/CERT-Polska/drakvuf-sandbox/blob/master/drakrun/drakrun/injector.py#L99), which seems to send SIGKILL
after timeout has expired (https://github.com/python/cpython/blob/main/Lib/subprocess.py#L554-L556). The easiest fix is in DRAKVUF-Sandbox (change SIGKILL
for SIGTERM
, or add timeout param to injector), the proper fix is to figure out why injector sometimes doesn't finish.
I have done a small research on this issue. tl;dr: Lots of threads in explorer are pretty short-living and can be terminated in the middle of injection by another thread. On the other hand if you pin injector on one specific thread (I guess involved in message processing), you should get pretty reliable results. I've not tested it on production setup though.
More information: https://github.com/CERT-Polska/drakvuf-sandbox/issues/933
I think it would be even better to make long-running injector actions in the dedicated thread that is created for that purpose instead of relying on hijacked ones. This way we're not delaying things that explorer.exe does and that may have timeouts/hang detections.
Hi everyone, we' ve encountered a problem while perfoming some tests with the drakvuf-sandbox. In our case,
injector
seems to be randomly hanging for an undefined amount of time . We use the drakvuf + sandbox compiled directly from the source code, tag 1.0 and 0.18.2 respectively.We've added some temporary log points in the source to avoid the overhead of PRINT_DEBUG and we have this situation:
The logs were added along the injector path (64 bit) that creates the remote process and waits for its termination. As it is clearly visible, from the termination to the explorer user-mode breakpoint being hit, 5 minutes passes but depending on the situation, the bp might be never hit again.
We noticed that the "injection" user-mode breakpoint is placed on a "casual" address from the trap-frame so after the injector STEP3 (return of ResumeThread call) it seems that, by our understanding, there is not guarantee that this code will be touched again from explorer.exe. Could this be the reason of the hang?
Is there any reason why the "STEP4" code cannot be direcly called from the
wait_for_termination_cb
function instead of relying on the explorer breakpoint?We also noticed that in the related sandox issue (https://github.com/CERT-Polska/drakvuf-sandbox/pull/749), timeouts were fixed but in our case killing the injector process in the middle of the drakvuf loop, prevents any other instances to be launched with this error