Injector sometimes hangs while injecting commands into explorer

mnznndr97 commented 10 months ago

Hi everyone, we' ve encountered a problem while perfoming some tests with the drakvuf-sandbox. In our case, injector seems to be randomly hanging for an undefined amount of time . We use the drakvuf + sandbox compiled directly from the source code, tag 1.0 and 0.18.2 respectively.

We've added some temporary log points in the source to avoid the overhead of PRINT_DEBUG and we have this situation:

Launching injector with arguments: "injector" "-o" "json" "-d" "vm-1" "-r" "/var/lib/drakrun/profiles/kernel.json" "-i" "4244" "-k" "0x1aa002" "-m" "createproc" "-e" "cmd /C ipconfig /release >nul" "-w"
13:50:03.600 [WinInj] Searching for process with pid 4244
13:50:03.601 [WinInj] Pid 4244 EPROCESS @ 0xffffa68491f7d340
13:50:03.601 [WinInj] Searching for _KTHREAD.TrapFrame offset
13:50:03.601 [WinInj] _KTHREAD.TrapFrame offset: 0x90
13:50:03.601 [WinInj] Searching for _KTRAP_FRAME.Rip offset
13:50:03.601 [WinInj] _KTRAP_FRAME.Rip offset: 0x168
13:50:03.601 [WinInj] Initializing function addresses

13:50:03.602 [WinUtils] Getting kernel32.dll.ResumeThread() function address from eprocess 0xffffa68491f7d340
13:50:03.602 [WinUtils] Function @ 0x00007fff6e6ee3d0
13:50:03.602 [WinUtils] Getting kernel32.dll.CreateProcessW() function address from eprocess 0xffffa68491f7d340
13:50:03.602 [WinUtils] Function @ 0x00007fff6e6ece70
13:50:03.602 [WinInj] Injecting ...
13:50:03.602 [WinInj] Adding CR3 trap
13:50:03.602 [WinInj] Starting injection loop
13:50:03.952 [WinInj] CR3 associated process is 4244
13:50:03.952 [WinInj] Reading KTHREAD TrapFrame address
13:50:03.952 [WinInj] Trapframe @ 0xffff8a827bd9cb00
13:50:03.952 [WinInj] Reading TrapFrame RIP
13:50:03.952 [WinInj] UserMode RIP @ 0x00007fff6f470a74
13:50:03.952 [WinInj] Adding BP @ VA 0x00007fff6f470a74, CR3: 0x0000000151d72002
13:50:03.952 [WinInj] Removing CR3 trap
13:50:03.952 [WinInj] INT3 trap @ 0x00007fff6f470a74 for PID 4244
13:50:03.952 [WinCreateProc] Entering CreateProc handler
13:50:03.952 [WinCreateProc] CreateProc step1
13:50:03.969 [WinInj] INT3 trap @ 0x00007fff6f470a74for PID 4244
13:50:03.969 [WinCreateProc] Entering CreateProc handler
13:50:03.969 [WinCreateProc] CreateProcessW exited
13:50:03.969 [WinCreateProc] Injected PID: 7228. TID: 5556
13:50:03.970 [WinCreateProc] Waiting for injected process
13:50:03.970 [WinCreateProc] Process start detected 7228 -> 0xd98c8002
13:50:03.970 [WinInj] INT3 trap @ 0x00007fff6f470a74 for PID 4244
13:50:03.970 [WinCreateProc] Entering CreateProc handler
13:50:03.970 [WinCreateProc] CreateProc -> ResumeThread call ended. RAX: 0x1
13:50:03.970 [WinCreateProc] Resume successful
13:50:04.503 [WinInj] INT3 trap @ 0x00007fff6f470a74 for PID 4244
13:50:04.503 [WinCreateProc] Entering CreateProc handler
13:50:04.503 [WinCreateProc] CreateProc step4. Detected: 0

>>> 13:50:04.781 [WinCreateProc] Termination of process detected
>>> 13:55:31.261 [WinInj] INT3 trap @ 0x00007fff6f470a74 for PID 4244

13:55:31.261 [WinCreateProc] Entering CreateProc handler
13:55:31.261 [WinCreateProc] CreateProc step4. Detected: 1
13:55:31.261 [WinInj] Finished injection loop
{"Plugin": "inject", "TimeStamp": "1703685331.261414", "Method": "CreateProc", "Status": "Success", "ProcessName": "cmd", "Arguments": "/C ipconfig /release >nul", "InjectedPid": 7228, "InjectedTid": 5556}

The logs were added along the injector path (64 bit) that creates the remote process and waits for its termination. As it is clearly visible, from the termination to the explorer user-mode breakpoint being hit, 5 minutes passes but depending on the situation, the bp might be never hit again.

We noticed that the "injection" user-mode breakpoint is placed on a "casual" address from the trap-frame so after the injector STEP3 (return of ResumeThread call) it seems that, by our understanding, there is not guarantee that this code will be touched again from explorer.exe. Could this be the reason of the hang?

Is there any reason why the "STEP4" code cannot be direcly called from the wait_for_termination_cb function instead of relying on the explorer breakpoint?

We also noticed that in the related sandox issue (https://github.com/CERT-Polska/drakvuf-sandbox/pull/749), timeouts were fixed but in our case killing the injector process in the middle of the drakvuf loop, prevents any other instances to be launched with this error

tklengyel commented 10 months ago

If you kill any drakvuf/injector process with SIGTERM you should not get that error when you try to launch again. If you kill -9 then you would get that error. You can also specify a timeout for injector after which it would exit cleanly in case the injection didn't go through in time.

BonusPlay commented 10 months ago

DRAKVUF-Sandbox does subprocess.run() with a timeout (https://github.com/CERT-Polska/drakvuf-sandbox/blob/master/drakrun/drakrun/injector.py#L99), which seems to send SIGKILL after timeout has expired (https://github.com/python/cpython/blob/main/Lib/subprocess.py#L554-L556). The easiest fix is in DRAKVUF-Sandbox (change SIGKILL for SIGTERM, or add timeout param to injector), the proper fix is to figure out why injector sometimes doesn't finish.

psrok1 commented 3 months ago

I have done a small research on this issue. tl;dr: Lots of threads in explorer are pretty short-living and can be terminated in the middle of injection by another thread. On the other hand if you pin injector on one specific thread (I guess involved in message processing), you should get pretty reliable results. I've not tested it on production setup though.

More information: https://github.com/CERT-Polska/drakvuf-sandbox/issues/933

I think it would be even better to make long-running injector actions in the dedicated thread that is created for that purpose instead of relying on hijacked ones. This way we're not delaying things that explorer.exe does and that may have timeouts/hang detections.

tklengyel / drakvuf

Injector sometimes hangs while injecting commands into explorer #1758