mtarral commented 5 years ago

As VMI on KVM is growing and gaining maturity, I wanted to have an overview of the refactoring required to port Drakvuf on KVM.

[ ] Gaining multi-hypervisor compatibility
[ ] Refactoring breakpoints to allow for non alternate SLAT hypervisor APIs

Gaining multi-hypervisor compatibility

As of today, Drakvuf's design is ambiguous and uses both the LibVMI library as its main abstraction layer for introspection, as well as performing direct calls to Xen.

Listing the linked libraries:

$ rabin2 -l ./src/drakvuf
[Linked libraries]
libvmi.so.0
libm.so.6
libdl.so.2
libjson-c.so.3
libglib-2.0.so.0
libxentoollog.so.1
libxenlight.so.4.12
libxenctrl.so.4.12
libxenforeignmemory.so.1
libxencall.so.1
libpthread.so.0
libstdc++.so.6
libgcc_s.so.1
libc.so.6

14 libraries

Listing the Xen functions being called:

$ rabin2 -i ./src/drakvuf | grep xc_
25:  23 0x00059170  GLOBAL    FUNC xc_interface_close
39:  37 0x00059250  GLOBAL    FUNC xc_domain_getinfo
70:  68 0x00059410  GLOBAL    FUNC xc_altp2m_set_domain_state
71:  69 0x00059420  GLOBAL    FUNC xc_evtchn_open
75:  73 0x00059460  GLOBAL    FUNC xc_domain_unpause
84:  82 0x000594e0  GLOBAL    FUNC xc_memshr_nominate_gfn
96:  94 0x00059590  GLOBAL    FUNC xc_map_foreign_range
103: 101 0x00059600  GLOBAL    FUNC xc_domain_decrease_reservation_exact
119: 117 0x000596f0  GLOBAL    FUNC xc_evtchn_fd
125: 123 0x00059750  GLOBAL    FUNC xc_altp2m_change_gfn
132: 130 0x000597b0  GLOBAL    FUNC xc_interface_open
151: 149 0x000598c0  GLOBAL    FUNC xc_altp2m_destroy_view
153: 151 0x000598e0  GLOBAL    FUNC xc_evtchn_close
154: 152 0x000598f0  GLOBAL    FUNC xc_domain_pause
158: 156 0x00059930  GLOBAL    FUNC xc_memshr_control
175: 173 0x00059a10  GLOBAL    FUNC xc_domain_populate_physmap_exact
177: 175 0x00059a30  GLOBAL    FUNC xc_altp2m_switch_to_view
190: 188 0x00059b00  GLOBAL    FUNC xc_altp2m_create_view
208: 206 0x00059c00  GLOBAL    FUNC xc_domain_setmaxmem
234: 232 0x00059d50  GLOBAL    FUNC xc_memshr_share_gfns
239: 237 0x00059d80  GLOBAL    FUNC xc_domain_maximum_gpfn

A couple of questions regarding this:

why are we initializing the Xen interface by ourselves, and using xc_evtchn_fd for external monitoring ? are there any technical/historical reasons for this ?
why calling xc_domain_pause/unpause directly instead of using Libvmi, which provides these interfaces ?
reagrding xc_altp2m, it's a matter of implementing the vmi_slat_ functions in Libvmi.

main question: what is preventing us from rebasing the VMI calls on Libvmi only ?

Refactoring breakpoints to allow for non alternate SLAT hypervisor APIs

Xen is the only hypervisor providing an alternate SLAT API.

We could wait for other hypervisors to level-up their game, but seeing how difficult this is to implement, it would take years.

Another strategy could be to offer another way of handling breakpoints, via instruction emulation.

Instruction emulation is not bullet-proof, as we already discussed previously (#667), but it will at least provide a Drakvuf compatibility with other hypervisors, expanding the community even further (more plugins, more contributors, more users)

This an open discussion, I'd like your comments.

Thanks.

tklengyel commented 5 years ago

why are we initializing the Xen interface by ourselves, and using xc_evtchn_fd for external monitoring ? are there any technical/historical reasons for this ?

There are users of DRAKVUF who poll on multiple fd's in the event loop.

why calling xc_domain_pause/unpause directly instead of using Libvmi, which provides these interfaces ?

The pause counts are reference counted by Xen and the LibVMI API just issues a single unpause call, potentially leaving the domain paused. The pause call in LibVMI is also only issued if the domain is not already paused, so if we do actually want to increase the pause count, it won't work.

reagrding xc_altp2m, it's a matter of implementing the vmislat functions in Libvmi.

Correct, that wasn't there when I implemented the altp2m breakpoints in DRAKVUF. Didn't really bother porting it over to LibVMI since no other hypervisor has or likely will have altp2m-like capability.

tklengyel commented 5 years ago

Instruction emulation is not bullet-proof, as we already discussed previously (#667), but it will at least provide a Drakvuf compatibility with other hypervisors, expanding the community even further (more plugins, more contributors, more users)

While in principle adding another hypervisor sounds nice, doing so with a different breakpoint mechanism would increase complexity tremendously. Instruction emulation is an acceptable route when there is no other way to do it - after all I was going down that road to begin with, that's the only reason why it's even available in Xen - it would be probably a lot easier to detect. So the system would have different stealth properties on different hypervisors. So unless there is a feature of the other hypervisor that we must have, I would not bother porting it. For example, supporting Bareflank/Boxy I do prioritize over KVM because with that we gain flexibility and abilities that would be hard to implement with any of the other hypervisors (including Xen).

tklengyel / drakvuf

Drakvuf on KVM #679

Gaining multi-hypervisor compatibility

Refactoring breakpoints to allow for non alternate SLAT hypervisor APIs