Open carttam opened 8 months ago
Hello, I am writing here since I was on the point of starting a new issue but maybe we have the same problem. I am experiencing domain freezing while running Codemon for monitoring the whole userspace in Windows 10 20H1 and my output look really similar to that above. It happens sometimes and, at the moment, I cannot really reproduce arbitrarily the error. I suppose there is some trouble in managing events. My guess is that some event is not correctly handled because of some sort of lack of atomicity in removing/adding events and the domain is suspended during singlestepping but I have no idea on how to verify that this is the case. Thanks in advance for the help, Alessandro
Debugging that type of error is really difficult. What may help is to verify if this is a new issue or if you had the same problem with older versions. If its an issue only happening with a newer version then some recent change might have broke the logic to fix, which should easier. If its happening with older versions as well, then the logic was already broken and its much harder to figure out why.
Ok ok, I would like to try to debug it but I am not very proficient yet working with Xen. As I was saying, my suspicion is that event management is somehow broken. Maybe passing through the vm_event interface I could figure out what makes my domU hang dumping events and checking which one is not managed by the stack libvmi+drakvuf+codemon. As an alternative, I could try to write a more concise stress test for memaccess events to try to understand what is wrong. Do you have any advice for me, Tamas?
Debugging that type of error is really difficult. What may help is to verify if this is a new issue or if you had the same problem with older versions. If its an issue only happening with a newer version then some recent change might have broke the logic to fix, which should easier. If its happening with older versions as well, then the logic was already broken and its much harder to figure out why.
I tested version 1.0 and this problem was also present. I noticed that by setting PRINT_DEBUG
, the output of the callback event and struct event was similar to the previous times it was called. With many tests, I could not find any properties under which this error occurs.
I just realized that if, for example, in previous executions of Xen, after the ReturnHook that was frozen in the chrome.exe process, I only filter (running Drakvuf with -C --context-process chrome.exe
), Xen does not freeze.
I tried to create a problem like the current state by breaking the code, such as changing the event output value or changing event->interrupt_event.reinject
or drakvuf->in_callback
, all of which led to the crash of Drakvuf itself and Xen did not freeze.
Has such a problem happened before? Or do you know the reasons that can cause this problem?
Thank you for your great project, I hope it can be solved
At last, I was able to make Xen freeze at the beginning of the execution by commenting this part of the codes.
Hello ,
With many tests, I realized that the problem arises from the vmi_slat_change_gfn
function to change the GFN to 0, I still don't know why this happens.
Anyway, using the vmi_set_mem_event
function to change the access level to VMI_MEMACCESS_N
solved the problem.
https://github.com/tklengyel/drakvuf/blob/1859dc9657e5ccab5ce925fe60980378544f2f88/src/libdrakvuf/vmi.c#L1184-L1198
https://github.com/tklengyel/drakvuf/blob/1859dc9657e5ccab5ce925fe60980378544f2f88/src/libdrakvuf/vmi.c#L1216-L1229
for example : vmi_set_mem_event(vmi, container->memaccess.gfn, VMI_MEMACCESS_N, drakvuf->altp2m_idx)
Yea, don't do that. That disables the core functionality of DRAKVUF and it makes the breakpoints detectable by the guest.
I always encounter the same problem when I use the apimon of drakvuf. The qemu-xen logs show the following memory-related error and qemu-xen hangs. This happens in any recent version.
$ cat /var/log/xen/qemu-dm-*.log
VNC server running on :::5900
Locked DMA mapping while invalidating mapcache! 0000000000000eff -> 0x7f42f34f72e0 is present
qemu-system-i386: terminating on signal 1 from pid 24521 (xl)
Hi, I ran Drakvuf with Procmon and Apimon plugins on a Windows 7 SP1 virtual machine with a sample malware that I found in MalwareBazaar. After a long while after default browser (IE) openned , the Xen virtual machine hung and froze, and even the
xl destroy
command did not work. So, I had to kill the QEMU process to force it to stop.xl list
result:Here is the time of execution of malware stderr log for both runs: trace1 trace2