HVM guest crashes when running Drakvuf

tklengyel / drakvuf

DRAKVUF Black-box Binary Analysis

https://drakvuf.com

Other

1.04k stars 249 forks source link

HVM guest crashes when running Drakvuf #388

Closed pbraun9 closed 6 years ago

pbraun9 commented 6 years ago

When running Drakvuf against an HVM Linux guest, I can see a few kernel traces during one second or two, and the HVM guest simply crashes.

# ./src/drakvuf -r /root/linux.json -d xenial2
DRAKVUF v0.6-8a5f960
[SYSCALL] TIME:1524754516.078658 VCPU:0 CR3:0x1c140000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524754516.287184 VCPU:0 CR3:0x1c140000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524754516.491251 VCPU:0 CR3:0x1c140000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524754516.695204 VCPU:0 CR3:0x1c140000,"kworker/0:1" UID:0 linux!sys_imageblit

then the guest appears in the xl list output as:

(null) 11 0 1 --pscd 15.4

until I interrupt Drakvuf, so the (null)-named domain finally gets cleaned up.

When looking into the process with strace I see:

mmap(NULL, 4096, PROT_READ, MAP_SHARED, 13, 0) = 0x7f362b11c000
ioctl(13, _IOC(0, 0x50, 0x04, 0x20), 0x7fffbcde5fc0) = 0
mmap(NULL, 4096, PROT_READ, MAP_SHARED, 13, 0) = 0x7f362b11b000
ioctl(13, _IOC(0, 0x50, 0x04, 0x20), 0x7fffbcde5fc0) = 0
mmap(NULL, 4096, PROT_READ, MAP_SHARED, 13, 0) = 0x7f362b11a000
ioctl(13, _IOC(0, 0x50, 0x04, 0x20), 0x7fffbcde5fc0) = 0
mmap(NULL, 4096, PROT_READ, MAP_SHARED, 13, 0) = 0x7f362b119000
ioctl(13, _IOC(0, 0x50, 0x04, 0x20), 0x7fffbcde5fc0) = 0
ioctl(17, IOCTL_EVTCHN_NOTIFY, 0x7fffbcde656c) = 0
poll([{fd=17, events=POLLIN|POLLERR}], 1, 1000) = 0 (Timeout)
ioctl(17, IOCTL_EVTCHN_NOTIFY, 0x7fffbcde656c) = 0
poll([{fd=17, events=POLLIN|POLLERR}], 1, 1000) = 0 (Timeout)
ioctl(17, IOCTL_EVTCHN_NOTIFY, 0x7fffbcde656c) = 0
poll([{fd=17, events=POLLIN|POLLERR}], 1, 1000) = 0 (Timeout)
ioctl(17, IOCTL_EVTCHN_NOTIFY, 0x7fffbcde656c) = 0
poll([{fd=17, events=POLLIN|POLLERR}], 1, 1000) = 0 (Timeout)
ioctl(17, IOCTL_EVTCHN_NOTIFY, 0x7fffbcde656c) = 0
poll([{fd=17, events=POLLIN|POLLERR}], 1, 1000
^Cstrace: Process 1160 detached
 <detached ...>

Also in /var/log/xen/xenstored-access.log I get quite a few entries that would be too large to copy/paste. To get the idea,

==> /var/log/xen/xenstored-access.log <==
[20180426T15:03:57.998Z]  A160         newconn   
[20180426T15:03:57.998Z]  A161         newconn   
[20180426T15:03:57.999Z]  A162         newconn   
[20180426T15:03:57.999Z]  A163         newconn   
[20180426T15:03:57.999Z]  A163         endconn   
[20180426T15:04:01.323Z]  A18          w event   @releaseDomain 3/0 
[20180426T15:04:01.323Z]  A3           w event   @releaseDomain domlist 
[20180426T15:04:01.323Z]  A159         w event   @releaseDomain 3/0 
[20180426T15:04:01.324Z]  A159         rm        /local/domain/0/device-model/5 

==> /var/log/syslog <==
Apr 26 18:04:01 lenovo kernel: [ 1859.928466] xenbr0: port 5(xenial2.0-emu) entered disabled state

The HVM guest configuration as follows.

arch = 'x86_64'
name = "xenial2"
maxmem = 512
vcups = 1
maxcups = 1
builder = "hvm"
#boot = "cd"
hap = 1
acpi = 1
sdl = 1
usb = 0
altp2m = 1
shadow_memory = 16
audio=0
disk = ['file:/data/guests/xenial2/xenial2.disk,hda,w']
#       'file:/data/ISO-IMAGES/devuan.iso,hdc:cdrom,r']
vif = [ 'vifname=xenial2.0' ]

How to run Drakvuf in debug mode? Any idea why the guest is crashing? I tried with LVM against a loop device and I got the same result. Is LVM mandatory? If so I would have to try against a real PV and not a looped device.

Thank you

tklengyel commented 6 years ago

Take a look at https://github.com/tklengyel/drakvuf/wiki/Debugging-DRAKVUF

tklengyel commented 6 years ago

LVM is not mandatory, you can use whatever disk device you want. I don't think this issue is related to the disk.

pbraun9 commented 6 years ago

with a new guest called xenial3 being an Ubuntu Xenial, no stdout, just the crash

xenial3.debug.stderr.txt

(XEN) d15v0 vmentry failure (reason 0x80000021): Invalid guest state (0)
(XEN) ************* VMCS Area **************
(XEN) *** Guest State ***
(XEN) CR0: actual=0x000000008005003b, shadow=0x0000000080050033, gh_mask=ffffffffffffffff
(XEN) CR4: actual=0x0000000000362670, shadow=0x0000000000360670, gh_mask=ffffffffffffffff
(XEN) CR3 = 0x8000000017464000
(XEN) PDPTE0 = 0x0000000000000000  PDPTE1 = 0x0000000000000000
(XEN) PDPTE2 = 0x0000000000000000  PDPTE3 = 0x0000000000000000
(XEN) RSP = 0x00007f3d6b8ccc38 (0x00007f3d6b8ccc38)  RIP = 0xffffffff8184ef2d (0xffffffff8184ef2d)
(XEN) RFLAGS=0x00000006 (0x00000006)  DR7 = 0x0000000000000400
(XEN) Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff81851f60
(XEN)        sel  attr  limit   base
(XEN)   CS: 0010 0a09b ffffffff 0000000000000000
(XEN)   DS: 0000 1c000 ffffffff 0000000000000000
(XEN)   SS: 0018 0c093 ffffffff 0000000000000000
(XEN)   ES: 0000 1c000 ffffffff 0000000000000000
(XEN)   FS: 0000 1c000 ffffffff 00007f3d6b8cd700
(XEN)   GS: 0000 1c000 ffffffff ffff88001f400000
(XEN) GDTR:            0000007f ffff88001f40c000
(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
(XEN) IDTR:            00000fff ffffffffff574000
(XEN)   TR: 0040 0008b 00002087 ffff88001f4048c0
(XEN) EFER = 0x0000000000000000  PAT = 0x0407010600070106
(XEN) PreemptionTimer = 0x00000000  SM Base = 0x00000000
(XEN) DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
(XEN) PerfGlobCtl = 0x0000000000000000  BndCfgS = 0x0000000000000000
(XEN) Interruptibility = 00000000  ActivityState = 00000000
(XEN) *** Host State ***
(XEN) RIP = 0xffff82d08030a140 (vmx_asm_vmexit_handler)  RSP = 0xffff83050fd47f90
(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
(XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff83050fd4ec80
(XEN) GDTBase=ffff83050fd3e000 IDTBase=ffff83050fd4a000
(XEN) CR0=000000008005003b CR3=0000000457286000 CR4=00000000003526e0
(XEN) Sysenter RSP=ffff83050fd47fc0 CS:RIP=e008:ffff82d080348ba0
(XEN) EFER = 0x0000000000000000  PAT = 0x0000050100070406
(XEN) *** Control State ***
(XEN) PinBased=0000003f CPUBased=b6a0e5fa SecondaryExec=001254eb
(XEN) EntryControls=000153ff ExitControls=008fefff
(XEN) ExceptionBitmap=0006008a PFECmask=00000000 PFECmatch=00000000
(XEN) VMEntry: intr_info=000000f3 errcode=00000000 ilen=00000000
(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000003
(XEN)         reason=80000021 qualification=0000000000000000
(XEN) IDTVectoring: info=00000000 errcode=00000000
(XEN) TSC Offset = 0xffffdf608612721d  TSC Multiplier = 0x0000000000000000
(XEN) TPR Threshold = 0x00  PostedIntrVec = 0x00
(XEN) EPT pointer = 0x000000041774f01e  EPTP index = 0x0000
(XEN) PLE Gap=00000080 Window=00001000
(XEN) Virtual processor ID = 0x08d6 VMfunc controls = 0000000000000000
(XEN) **************************************
(XEN) domain_crash called from vmx.c:3337
(XEN) Domain 15 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.9.1  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    0010:[<ffffffff8184ef2d>]
(XEN) RFLAGS: 0000000000000006   CONTEXT: hvm guest (d15v0)
(XEN) rax: 8000000017464000   rbx: 00000000000012da   rcx: 00007ffdd43e9b39
(XEN) rdx: 0000000000000000   rsi: 00007f3d6b8ccc90   rdi: 0000000000000001
(XEN) rbp: 00007f3d6b8ccc60   rsp: 00007f3d6b8ccc38   r8:  0000000000000007
(XEN) r9:  0000000000000001   r10: 00007f3d64001880   r11: 0000000000000246
(XEN) r12: 00007f3d6b8ccc44   r13: 00188de0c5800000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000360670
(XEN) cr3: 8000000017464000   cr2: 00007f1368670090
(XEN) fsb: 00007f3d6b8cd700   gsb: ffff88001f400000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010

tklengyel commented 6 years ago

This looks a like vmx_failed_vmentry, so this is quite a low-level bug. I would also like to see the DRAKVUF debug logs to see what happens there, but to me this looks like a strange Xen issue. Perhaps try to upgrade your Xen installation to Xen 4.10 and see if you still have the problem

pbraun9 commented 6 years ago

With XEN 4.10.0 and Drakvuf recompiled against latest libvmi git repository, I got the same issue, this for the previously used xenial2 with vcups = 2, maxcups = 2, altp2m = 2. That last parameter needs to reflect the number of vcpus?

DRAKVUF debug: xenial2.debug___.stderr.txt

XEN dmesg xen.xenial2.crash.txt

Update: XEN 4.10.0 dmesg starts with (XEN) parameter "flask_enforcing" unknown!, which does not look good.

pbraun9 commented 6 years ago

I understand now XSM/Flask is NOT required for Drakvuf to run. So the flask* boot argument is not either. As for alt2pm=X I do not find any hint anywhere on how to define X.

@tklengyel, hi, am I providing the required material, namely the Drakvuf debug output as requested? Here's another one: drakvuf debug / xen 4.9.1 / libvmi 0.13 / rekall 1.7.1

tklengyel commented 6 years ago

No, on the Xen command line altp2m is just a boolean parameter. Refer to https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html if you want more details. This is not a bug in DRAKVUF but a bug in Xen but from the logs posted I can't see what is wrong. Your best bet would be either to debug this yourself or ask for help on the xen-devel mailinglist.

pbraun9 commented 6 years ago

flask_enforcing= and flask= are xen boot arguments (resp. 4.9-? vs 4.10) and on the other hand alt2pm= is a guest configuration (and is not mentioned in the xen command line guide at all). Sorry I mentioned both without any transition. The alt2pm=2 is taken from the guest configuration from the tutorial on drakvuf.com. So I guess my question remains.

tklengyel commented 6 years ago

No, altp2m is both a Xen command line argument as detailed in the document I linked, and a guest configuration option. You have to have both enabled properly for DRAKVUF to work. But neither that nor the flask option being enabled (or not) would cause the failed vmentry you are seeing. Your bug is likely somewhere else.

pbraun9 commented 6 years ago

Right, it is altp2m not alt2pm. But I am still wondering how to define that setting regarding the guest configuration.

Back to our issue, I played around with the hap setting (both on xen.gz and guest config) and observed that when ever it was disabled for the guest, the whole host was crashing.

Then as long hap is enabled either by default on for that guest, I got back to my xen-bug with a Drakvuf debug trace that looks good (why where previous traces so much larger?):

# src/drakvuf -v -r /root/xenial3.json -d xenial3
DRAKVUF v0.6-1c2a1b0
Starting DRAKVUF initialization
Init VMI on domID 1 -> xenial3
Max GPFN: 0xff001
Max mem set? 0
Physmap populated? 0
Altp2m enabled? 0
Altp2m view X created? 0 with ID 1
Altp2m view R created? 0 with ID 2
Switched Altp2m view to X? 0
libdrakvuf initialized
DRAKVUF initializated
Starting plugins
Starting plugin syscalls
Starting plugin syscalls finished
Starting plugin poolmon
Starting plugin filetracer
Starting plugin filedelete
Starting plugin objmon
Starting plugin exmon
Starting plugin ssdtmon
Starting plugin debugmon
Starting plugin debugmon finished
Starting plugin cpuidmon
Starting plugin cpuidmon finished
Starting plugin socketmon
Starting plugin regmon
Starting plugin procmon
Beginning DRAKVUF loop
Started DRAKVUF loop
^CDRAKVUF loop finished
Finished DRAKVUF loop
starting close_vmi_drakvuf
close_vmi_drakvuf finished

So for the record, to me it looks like for Drakvuf to be able to run, hap=false is NOT required as a xen.gz boot argument. In the end I am doing my tests and troubleshooting with this reduced set of parameters: (XEN) Command line: dom0_mem=12288M altp2m=1. As for the guest configuration, I suppose that reduced set of settings would also be good. I am not sure about maxmem vs memory though.

#DRAKVUF
altp2m=1
#HVM
type = "hvm"
boot = "cd"
sdl = 1
#PV
name = "xenial3"
memory = 512
vcups = 1
disk = ['tap:aio:/data/guests/xenial3/xenial3.disk,xvda,w',
       'file:/data/ISO-IMAGES/ubuntu-16.04.4-server-amd64.iso,hdc:cdrom,r']
vif = [ 'vifname=xenial3.0' ]

tklengyel commented 6 years ago

There is no such boot param as hap=false. The 1gb/2mb page sizes are being turned off are not required but advised to be there otherwise Xen will have to shatter pages at runtime. These likely have nothing to do with your crash.

pbraun9 commented 6 years ago

There is such parameter as hap=, I am not making it up by myself,

hap (x86)

    = <boolean>

    Default: true

Flag to globally enable or disable support for Hardware Assisted Paging (HAP)

Ok, I have upgraded my machine's firmware and I am up for another round of testing.

For the record, the setup I am using now. The guest config:

type = "hvm"
sdl = 1

altp2m = 2
maxmem = 512

name = "devuanhvm"
#memory = 512
vcups = 2
disk = ['tap:qcow2:/data/guests/devuanhvm/devuanhvm.qcow2,xvda,w',
        'file:/data/ISO-IMAGES/devuan.iso,hdc:cdrom,r']
vif = [ 'vifname=devuanhvm.0' ]

The XEN 4.10.0 (+XSM but disabling it with flask=disabled) boot options:

dom0_mem=4096M,max:4096M dom0_max_vcpus=1 dom0_vcpus_pin=true hap_1gb=false hap_2mb=false altp2m=1 flask=disabled

If not using vcpus and/or pin, running Drakvuf crashes the whole machine. I did not investigate this further although loglvl=all noreboot might help. So I guess I found out why the host was crashing.

About the guest crashes, well it still happens, and here is a new snipped of debug output:

Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4fa0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4f80 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4f80 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4fa0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4fa0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4f88 in view 1: rw-
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4f88 vCPU 0 altp2m 0
Re-copying remapped gfn
Pre mem cb with vCPU 0 @ 0xb0e4f88 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4f88 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4f78 in view 1: rw-
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4f78 vCPU 0 altp2m 0
Re-copying remapped gfn
Pre mem cb with vCPU 0 @ 0xb0e4f80 in view 1: rw-
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4f80 vCPU 0 altp2m 0
Re-copying remapped gfn
Pre mem cb with vCPU 0 @ 0xb0e4fa0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0xb0e4fa0 in view 1: rw-
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
Re-copying remapped gfn
Pre mem cb with vCPU 0 @ 0xb0e4fa0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0e4fa0 vCPU 0 altp2m 0
[SYSCALL] TIME:1528270679.635098 VCPU:0 CR3:0x1eb4a000,"kworker/0:1" UID:0 linux!sys_imageblit
Switching altp2m and to singlestep on vcpu 0
reset trap on vCPU 0, switching altp2m 0->1
Pre mem cb with vCPU 0 @ 0xb127980 in view 1: r--
[...]
Pre mem cb with vCPU 0 @ 0xb0d98a0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0d98a0 vCPU 0 altp2m 0
[SYSCALL] TIME:1528270679.861790 VCPU:0 CR3:0x1eb4a000,"kworker/0:1" UID:0 linux!sys_imageblit
Switching altp2m and to singlestep on vcpu 0
reset trap on vCPU 0, switching altp2m 0->1
Pre mem cb with vCPU 0 @ 0xb0d98a0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0d98a0 vCPU 0 altp2m 0
[SYSCALL] TIME:1528270680.065850 VCPU:0 CR3:0x1eb4a000,"kworker/0:1" UID:0 linux!sys_imageblit
Switching altp2m and to singlestep on vcpu 0
reset trap on vCPU 0, switching altp2m 0->1
Pre mem cb with vCPU 0 @ 0xb0d98a0 in view 1: r--
[...]
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0xb0d98a0 vCPU 0 altp2m 0
[SYSCALL] TIME:1528270680.271010 VCPU:0 CR3:0x1eb4a000,"kworker/0:1" UID:0 linux!sys_imageblit
Switching altp2m and to singlestep on vcpu 0
reset trap on vCPU 0, switching altp2m 0->1

Does this possibly reveal some other root cause or am I still facing a probable xen bug?

@tklengyel I am still wondering how to use the altp2m= guest setting! Should it match the number of vcpus?

Thanks

tklengyel commented 6 years ago

The hap option is not a boot param, it is a guest configuration option. I didn't say you made it up, I think you are just confusing Xen boot options and guest config options. Those are very different.

The altp2m guest config option doesn't have to match the number of vcpus. Read the documentation https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html

The logs you posted don't explain why your guest crashes.

pbraun9 commented 6 years ago

@tklengyel, but the title reads Xen Hypervisor Command Line Options, so I guess the hap= setting does exist. Maybe this is a new setting. https://xenbits.xen.org/docs/4.9-testing/misc/xen-command-line.html https://xenbits.xen.org/docs/4.10-testing/misc/xen-command-line.html https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

Ok thank you, as for the guest configuration. 2 seems to correspond to "external".

tklengyel commented 6 years ago

Please try upgrading your Xen installation to Xen 4.11 rc6 and post if it solves your problem

pbraun9 commented 6 years ago

Hi. Now with XEN 4.11 rc6 and DRAKVUF v0.6-3868a26, the behavior has changed. The DRAKVUF process ends by itself with an error message:

DRAKVUF v0.6-3868a26
Starting DRAKVUF initialization
drakvuf_event_fd_add fd=14
size of list=1
regenerating event_fds and fd_info_lookup...
new event_fd i=0 for fd=14
new fd_info_lookup i=0 for fd=14
drakvuf_init: adding event_fd done
Init VMI on domID 4 -> devuanhvm
init_vmi: initializing vmi done
Max GPFN: 0xff001
Max mem set? 0
Physmap populated? 0
Altp2m enabled? 0
Altp2m view X created? 0 with ID 1
Altp2m view R created? 0 with ID 2
Switched Altp2m view to X? 0
libdrakvuf initialized
DRAKVUF initializated
Starting plugins
Starting plugin syscalls
Rekall profile: no $FUNCTIONS section found
Rekall profile defines 75360 symbols
Received 75360 symbols
[...]
Physmap populated? 0
Copied trapped page to new location
Activating remapped gfns in the altp2m views!
                Trap added @ PA 0x19904440 RPA 0xff07d440 Page 104708 for sys_acct. 
                Trap added @ PA 0x19a048e0 RPA 0xff0138e0 Page 104964 for sys_access. 
                Trap added @ PA 0x19ced9a0 RPA 0xff0229a0 Page 105709 for sys_accept4. 
                Trap added @ PA 0x19ced9b0 RPA 0xff0229b0 Page 105709 for sys_accept. 
Starting plugin syscalls finished
Starting plugin poolmon
Starting plugin filetracer
Starting plugin filedelete
Starting plugin objmon
Starting plugin exmon
Starting plugin ssdtmon
Starting plugin debugmon
Starting plugin debugmon finished
Starting plugin cpuidmon
Starting plugin cpuidmon finished
Starting plugin socketmon
Starting plugin regmon
Starting plugin procmon
Beginning DRAKVUF loop
Started DRAKVUF loop
VMI_ERROR: Error, Xen reports a VM_EVENT_INTERFACE_VERSION that doesn't match what we expected (0x00000002)!
Error waiting for events or timeout, quitting...
DRAKVUF loop finished
Finished DRAKVUF loop
starting close_vmi_drakvuf
Removed memtrap for GFN 0xff002 in altp2m view 1
close_vmi_drakvuf finished

Also, Instead of going (null) until Drakvuf ends and then simply disappearing, the guest now remains with State ------ and its consoles, SDL and serial, do not respond anymore. It does not understand xl shutdown and needs to be destroyed instead. I tried this a few times and the hex codes remain. Only once, no error was printed out (the Started DRAKVUF loop was the last message I saw) while the guest also froze.

tklengyel commented 6 years ago

You also need to update LibVMI

pbraun9 commented 6 years ago

Ok with latest LibVMI from git and Drakvuf recompiled against it, I tried it right away without restarting the guest:

DRAKVUF v0.6-3868a26
Starting DRAKVUF initialization
drakvuf_event_fd_add fd=14
size of list=1
regenerating event_fds and fd_info_lookup...
new event_fd i=0 for fd=14
new fd_info_lookup i=0 for fd=14
drakvuf_init: adding event_fd done
Init VMI on domID 6 -> devuanhvm
init_vmi: initializing vmi done
Max GPFN: 0xff001
Max mem set? 0
Physmap populated? 0
Altp2m enabled? 0
Altp2m view X created? 0 with ID 1
Altp2m view R created? 0 with ID 2
Switched Altp2m view to X? 0
VMI_ERROR: xc_hvm_set_mem_access failed with code: -1
*** FAILED TO SET MEMORY TRAP @ PAGE 1044482 ***
Failed to create guard trap for the empty page!
starting close_vmi_drakvuf
close_vmi_drakvuf finished
libdrakvuf initialization failed
Failed to initialize DRAKVUF

Then I shut the guest down and tried again, and everything seems fine now, with debug,

Post mem cb @ 0x48d9844 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0x48d98a0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0x48d98a0 vCPU 0 altp2m 0
Pre mem cb with vCPU 0 @ 0x48d98a0 in view 1: r--
Switching to altp2m view 0 on vCPU 0
Post mem cb @ 0x48d98a0 vCPU 0 altp2m 0
Switching altp2m and to singlestep on vcpu 0
reset trap on vCPU 0, switching altp2m 0->1

and without debug,

[SYSCALL] TIME:1528817872.199726 VCPU:0 CR3:0x1c0ee000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1528817872.403731 VCPU:0 CR3:0x1c0ee000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1528817872.607729 VCPU:0 CR3:0x1c0ee000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1528817872.817862 VCPU:0 CR3:0x1c0ee000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1528817873.015801 VCPU:0 CR3:0x1c0ee000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1528817873.031084 VCPU:0 CR3:0x1c0ee000,"init" UID:0 linux!sys_newstat
[SYSCALL] TIME:1528817873.031159 VCPU:0 CR3:0x1c0ee000,"init" UID:0 linux!sys_newfstat

The guest does not crash anymore.