Added support for Linux hosts

AlexAltea commented 5 years ago

Motivation

Although previously (#4) Linux support for HAXM was dismissed due to KVM being around. There's still reasons for having Linux support in this project:

Single cross-platform hypervisor API: Although QEMU supports most hypervisors, smaller cross-platform projects need to deal with multiple backends: WHPX on Windows, KVM on Linux, HVF on MacOS. HAXM can finally fill that gap and cover the 3 major platforms.
Extra features: Although KVM is a more mature hypervisor, HAXM is not a subset of it: As of today, there's certain guests that only run on HAXM [1], due to the recent emulator changes which added support for x86 instructions missing in KVM/Xen.
Kernel module: KVM is deeply integrated into Linux. Patches requires recompiling/installing the kernel. There's kvm-kmod but it's high-maintenance: Linux/KVM patches easily break kvm-kmod, and kvm-kmod patches easily break older kernels. Instead, HAXM can be built and installed in few seconds.
Permissive licensing: There's no QEMU-compatible hypervisor that supports Linux and offers a permissive license (e.g. BSD, MIT).

Additionally, we could take the chance to refactor a bit the codebase.

Changes

Normalized prefixes: Some platform-specific functions such as smp_mb or __fls had no hax_/asm_-prefix, clashing against functions declared in kernel headers.
Added Linux-specific code: Headers at include/linux/*.h, sources at linux/*.c, alongside Makefile+Kbuild files.
Added Linux to TravisCI build matrix: It should make maintenance easier.

Pending

Essential tasks:

[x] HAXM builds/links under Linux.
[x] Finishing implementation of Linux-specific functions.
[x] Testing changes.
[x] Adding build instructions.

Optional tasks that imply major changes to the codebase (needs discussion!):

Single ioctl definitions: Define ioctl's only once at hax_interface.h via HAX_IOCTL(name, code, size). Then implement the HAX_IOCTL macro on hax_interface_{darwin,windows,linux}.h to make the actual platform-specific ioctl definitions. This will reduce code duplication.

raphaelning commented 5 years ago

Thanks, Alexandro. Great to see you back with another big PR!

Windows and Mac have been and will still be the Intel HAXM team's priority, and realistically, we wouldn't add a third host OS to our support matrix. However, if the open source community wants this feature and can provide enough help, there's no reason for us to stand in the way, not to mention that this is a great chance to improve the overall code quality of the project ;-)

We'll accept this PR on these conditions:

It doesn't cause any regression on Windows and Mac hosts.
You will develop the QEMU-side patch for testing this PR, and do your own testing on Linux. (We'll try to run some simple tests on a Linux host before merging it, but that won't be enough coverage.)
You will be the maintainer of all the Linux-specific code that this PR adds to the source tree. In other words, in case someone else submits a PR for such code in the future, we will invite you to the review and will accept or reject the change based on your feedback.

What do you think? I have some other tasks in hand at the moment, but I'll try to respond to the major changes you have proposed in the next few days. The minor changes all look fine to me :-)

AlexAltea commented 5 years ago

Sounds good to me! I will provide support for Linux frontend. Additionally, the planned changes will ensure easier maintenance in the future (e.g. by reducing code duplication in platform-specific code).

I will comment on this PR to discuss any further major changes that might affect generic or Windows/macOS-specific code. Once the patch is ready, I will run tests on the three platforms (of course, this implies taking care of QEMU-side patches myself) and let you know once everything is ready.

Also on the topic of testing: After completing this PR, it might be a good moment to setup something like Travis CI? This should also reduce maintenance effort.

raphaelning commented 5 years ago

Sounds good to me! I will provide support for Linux frontend. Additionally, the planned changes will ensure easier maintenance in the future (e.g. by reducing code duplication in platform-specific code).

Thanks! Indeed.

Regarding the first proposed major change:

Combine platform-specific folders. Moving {darwin,windows,linux}/ to X/{darwin,windows,linux}/, to avoid polluting the root folder. Folder "X" could be called drivers, platforms, etc.

Agreed. Right now we have three kinds of files in {darwin,windows}/:

a) Makefiles (e.g. windows/IntelHaxm.vcxproj). b) C/C++ source files that implement for the host OS the required interface specified by a shared header (e.g. both windows/hax_host_mem.c and darwin/.../hax_host_mem.cpp implement include/hax_host_mem.h). c) Other C/C++ source files specific to the host OS, mainly to implement the driver interface specified by the host OS (e.g. windows/hax_entry.c).

Since drivers only covers (c), I'd prefer platforms.

Two more questions related to this topic:

Currently, shared headers (as described by (b) above) exist in both core/include/ and include/, and it's not very clear to me how to choose between them when we create a new shared header. Is there a better scheme?
Have you thought about using CMake to generate the Linux makefiles, with an eye to eventually do so for Windows and Mac as well using a shared CMake configuration? This could be a crazy idea, because last time I checked, CMake didn't have mature support for WDK projects.

I'll comment on the other proposed major changes later.

raphaelning commented 5 years ago

Combine documentation: Moving API.md to X/api.md, and add platform-specific build instructions in X/building.md or X/building-{darwin,windows,linux}.md to avoid polluting README.md. Folder "X" could be called docs, etc.

Agreed. docs is a good name.

Single ioctl definitions: Define ioctl's only once at hax_interface.h via HAX_IOCTL(name, code, size). Then implement the HAX_IOCTL macro on haxinterface{darwin,windows,linux}.h to make the actual platform-specific ioctl definitions. This will reduce code duplication.

Good idea! Agreed.

Normalized integer types: The codebase mixes intN_t and intN. It would convenient to pick one variant and consistently use it.

Agreed. I see that you have started doing this and have chosen [u]intN_t over [u]intN, which is the harder choice (because the latter types are more heavily used throughout the codebase), but also the right one, I'd say (because the former types are more familiar, as they resemble the standard types in <stdint.h>).

The question is what's the best strategy to implement such a ubiquitous change, so as to avoid rebase headaches. Here's what I think:

Don't mix it with other changes.
Create separate PR(s) for it, which can be merged before this one.
Split it into multiple commits/PRs, each one covering as few as one source file (e.g. vcpu.c) or folder, which can be reviewed and merged quickly.

Please let me know if this is reasonable or if you have better ideas.

raphaelning commented 5 years ago

Also on the topic of testing: After completing this PR, it might be a good moment to setup something like Travis CI? This should also reduce maintenance effort.

Good idea! I just started reading the Travis CI documentation, and it seems to work best with Linux Makefile projects. Since there's a risk of a common (core) patch causing a regression on Linux hosts, maybe we can set up Travis CI for Linux first, which will at least tell us right away if a patch breaks the build on Linux. I'm not sure if it can help us run QEMU-based tests, though. Neither do I know if it's possible to use it for Windows and Mac builds.

AlexAltea commented 5 years ago

Have you thought about using CMake to generate the Linux makefiles, with an eye to eventually do so for Windows and Mac as well using a shared CMake configuration?

I didn't considered it yet, but that is much more convenient. For now, we could have each platform-specific build script (Makefile, Xcode-files, VS-files) in their corresponding platforms/* folder, and merge everything under a single root-level CMakeLists for a future PR.

The question is what's the best strategy to implement such a ubiquitous change, so as to avoid rebase headaches. Here's what I think:

Agreed, I'll submit all refactoring-related changes as a separate PR.

I'm not sure if it can help us run QEMU-based tests, though.

We can execute arbitrary scripts on Travis (build scripts run in sudo-enabled virtual machines), so it should be possible as long as their hardware supports it and they virtualize EPT to allow nested virtualization.

Neither do I know if it's possible to use it for Windows and Mac builds.

Travis can handle Linux, macOS and Windows: https://docs.travis-ci.com/user/reference/overview/#virtualization-environments

AlexAltea commented 5 years ago

Currently, shared headers (as described by (b) above) exist in both core/include/ and include/, and it's not very clear to me how to choose between them when we create a new shared header. Is there a better scheme?

Include folders give me quite some headaches. There's two issues:

The current approach to include headers is weird: We explicitly mention the "include" folders directly in the paths, e.g. #include "../include/*.h", which most C/C++ codebases rarely do. Instead, those include directories should be added to the Makefile/Projects to simply write #include <*.h>.
It's not clear what each include folder should contain. I see 5 types of headers:
- Headers of core consumed by platforms (e.g.: core/include/hax_core_interface.h): Right now they are in core/include, which might be the appropriate place.
- Headers of core consumed by core (e.g.: core/include/vmx.h). Right now they are in core/include. Maybe they should be next to their sources: that is, moving them to core, e.g. core/vmx.h. There's no reason for having them separate from sources, since there are no external consumers. This is already the case for platforms headers consumed by platforms (see below).
- Headers for platforms consumed by core, (e.g.: include/hax_host_mem.h). Right now they are in include... Does that make sense?
- Headers for platforms consumed by platforms, (e.g.: windows/hax_win.h). Right now they are each next to their sources.
- Headers consumed by QEMU (e.g.: include/hax_interface.h). Right now they are in include, which seems alright since they are the only definitions that HAXM needs to export for 3rd party software.

Anyway, this comment is just brainstorming on my side. I still need to give this problem more thought.

AlexAltea commented 5 years ago

Combine platform-specific folders. Moving {darwin,windows,linux}/ to X/{darwin,windows,linux}/

@raphaelning Regarding this point, I've noticed there's a dirs file in the root-directory pointing to core and windows, each containing a sources file. This file seems to generate a sources.props file. Do you know exactly how this happens (which tool is responsible for that)? Are these files necessary at all?

raphaelning commented 5 years ago

Ah, great question. Yes, I do know what they are.

When I first took over the project, it didn't even have a Visual Studio project (.sln or .vcxproj). Instead, it was using the WDK 7.1 build system to build the Windows driver. That build system was pretty straightforward/basic: you just need to provide very simple makefiles like dirs and sources, and then run build.exe.

Later we decided to migrate to WDK 10 and Visual Studio 2015. I did that in two steps:

Convert the dirs-based project to a Visual Studio solution, using a tool available in Visual Studio 2012. I think it was the Nmake2MsBuild CLI tool, or its GUI equivalent. The tool is not available in later releases of Visual Studio.
Open the VS2012 solution in VS2015 and proceed with the format upgrade.

Step 1 ended up generating all the VS files we see now. Since they are autogenerated, they are much more complicated than the original dirs and sources files (if you remember, we used to have a lot of build configurations, Win{7, 8, 8.1} {Debug, Release}, which were created by the conversion tool). For example, sources.props was generated from sources. The tool also created the awkwardly-named dirs-Package project, and made it the default project of HaxmDriver.sln.

I believe we can safely delete the old makefiles (dirs, sources, Makefile.inc, etc.). But then, you may still consider HaxmDriver.sln and the dirs-Package folder as polluting the root of the source tree. If we somehow make CMake work for Windows, we'll be able to get rid of all the VS files.

raphaelning commented 5 years ago

Include folders give me quite some headaches. There's two issues:

I completely agree. Thanks for the really nice summary.

Headers of core consumed by core (e.g.: core/include/vmx.h). Right now they are in core/include. Maybe they should be next to their sources: that is, moving them to core, e.g. core/vmx.h. There's no reason for having them separate from sources, since there are no external consumers. This is already the case for platforms headers consumed by platforms (see below).

I think vmx.h is somewhat similar to ia32_defs.h. They define constants and data structures according to Intel SDM, and those definitions are not unique to HAXM. In some sense, they are similar to hax_types.h. Therefore, although they seem to be consumed by core at the moment, they can be consumed by platform sources as well (or at least we should allow that). Shall we move them to a separate folder under include/?

Headers for platforms consumed by core, (e.g.: include/hax_host_mem.h). Right now they are in include... Does that make sense?

No. And I should take the blame. I created that header but wasn't sure where to put it, and ended up making the wrong decision.

AlexAltea commented 5 years ago

Brief status update: I've noticed that platform-specific changes on the QEMU-side are minimal. In fact, the hax-darwin.h and hax-darwin.c files are not specific to Darwin, but to any POSIX-compliant system, and I've managed to come up with a preliminary patch at https://github.com/kryptoslogic/qemu/commit/48db0b4f0bcd952e09c3355c2b64948bf703da58.

I've renamed hax-darwin.* to hax-posix.*, which seems to be the standard suffix that QEMU uses for POSIX-specific sources (search on https://github.com/qemu/qemu/find/master). That way, we will reduce even further the maintenance effort required by this PR.

If you agree, I'll submit the corresponding patch to QEMU after this PR has been merged.

EDIT: I've just noticed your question (sorry for missing it!):

Shall we move them to a separate folder under include/?

For now I will try to focus just on the Linux patch. Unless the issue with the confusing include's causes problems, I'd say we could leave it for a future PR, after this one has been merged.

raphaelning commented 5 years ago

I've renamed hax-darwin. to hax-posix., [...] If you agree, I'll submit the corresponding patch to QEMU after this PR has been merged.

Sounds great, thanks!

Unless the issue with the confusing include's causes problems, I'd say we could leave it for a future PR, after this one has been merged.

Agreed.

AlexAltea commented 5 years ago

HAXM now builds and runs successfully on Linux, or at least on Ubuntu 18.04 (Linux 4.17.0, 64-bit). After updating the TravisCI script, since they offer Ubuntu 14.04, I've verified that HAXM links fine against older kernels, specifically 4.4.0. It will probably build against any Linux 4.x kernel.

More importantly: The PR is ready for review now. Code-wise there's nothing pending. As you see, I've followed the same design choices as in the Windows/Darwin frontends.

PS: Mandatory screenshot: :-) capture

raphaelning commented 5 years ago

Ah, I just realized that I've only reviewed the first two commits. Will have to leave the other two for tomorrow...

AlexAltea commented 5 years ago

Turns out that Debian (debian-9.5.0-amd64-xfce-CD-1.iso) panics right after choosing "Graphical install" in the boot menu. The same crash happens on Windows hosts so it can hardly be a Linux-side issue.

~~This might be a regression caused by this patch, I'll investigate the issue further and report back.~~

EDIT: Nevermind, I just forgot to assign enough RAM to the VM. Everything works as expected!

raphaelning commented 5 years ago

Thanks, the latest commit looks great, but I'm still waiting for your response to the HAX_VCPU_SET_REGS ioctl issue.

mborgerson commented 5 years ago

I've tested this on an Ubuntu 18.04 machine and was able to boot a QEMU guest! Great work, @AlexAltea!

Unfortunately however, I did have some issues. First is that the device files are created without any permissions. This makes it a pain when launching QEMU as a regular user. I think the files should probably be created with the group being set to haxm or something, and owner/group perms being r/w. This is how KVM does it as well:

crw-rw---- 1 root kvm 10, 232 Nov  2 00:46 /dev/kvm

Second, I also experienced some system instability when using HAXM which caused a hard kernel hang and forced me to hard-reset my system. This behavior happened a few times, always after trying to launch another program while QEMU was running in the background. Unfortunately, I have not captured a proper trace to help track down the source of the hang. I would not suggest to block a merge for this one however.

AlexAltea commented 5 years ago

I'm still waiting for your response to the HAX_VCPU_SET_REGS ioctl issue.

@raphaelning I have just removed those extra lines, at the *SET_REGS handler on 7062cce. That ioctl should not modify the vcpu_state_t struct. If that behavior breaks userland, then it's an issue on userland.

If you deem it necessary, I can cleanup the commit history before merging this branch.

I think the files should probably be created with the group being set to haxm or something, and owner/group perms being r/w. This is how KVM does it as well [...]

@mborgerson That's a good point, however this does not occur at kernel-level. In fact, on a Linux distro without QEMU, after manually creating /dev/kvm via its major/minor identifiers, I obtain this information:

$ ls -l /dev/kvm
crw-r--r-- 1 root root 10, 232 Nov  2 09:54 /dev/kvm

It's only after installing QEMU that the appropriate information is created:

$ cat /etc/group | grep kvm
kvm:x:125:
$ cat /lib/udev/rules.d/40-qemu-system-common.rules 
KERNEL=="kvm", GROUP="kvm", MODE="0660"
$ ls -l /dev/kvm 
crw-rw---- 1 root kvm 10, 232 Nov  2 09:54 /dev/kvm

I understand that the QEMU/KVM relationship has "historical background", but I personally find that HAXM should not have any dependency on QEMU, despite it being its primary consumer: There exist uses for hypervisors other than full-system emulation.

For now, I suggest that we handle group creation and udev rules with the Makefile.

make install: Will install the haxm.ko kernel module, create a haxm group, and create an udev rule to assign the /dev/HAX device to the haxm group.
make uninstall: Will return /dev/HAX to root, remove the udev rule, remove the haxm group and finally uninstall the haxm kernel module.

Note that Linux users won't automatically be added to the haxm group. They will need to do so manually via the following command:

sudo adduser `id -un` haxm

Afterwards, they will be able to launch qemu-system-x86_64 -accel hax as a regular user. All this information will be added to docs/manual-linux.md.

@raphaelning Sounds good? Proposal at dbfba2f.

Second, I also experienced some system instability when using HAXM which caused a hard kernel hang and forced me to hard-reset my system.

@mborgerson I will investigate this issue further. Thank you!

mborgerson commented 5 years ago

@AlexAltea thanks for looking into it! I wasn't aware that those perms on /dev/kvm were handled with a udev rule, but that makes sense. I think your suggestion regarding make [un-]install + documentation sounds good.

maronz commented 5 years ago

I'm quite impressed with the progress this project has recently seen. So I was curious, whether I could build it myself.

One of my observations (i.e. that 'make clean' leaves most of the *.o files still in place) looks to have been addressed by 'dbfba2f'.

Another one is that it looks like that no attempt has been made to build this on a 32-bit Linux system. I guess the NASM output file format will have to be dependent on the architecture. Furthermore the compiler ran into heaps of errors related to the 'ASMCALL' and 'EMCALL' macros. Since this is not my area of expertise, I can only report my finding, and hope that a solution is not too difficult to be found.

AlexAltea commented 5 years ago

Another one is that it looks like that no attempt has been made to build this on a 32-bit Linux system.

@maronz Thanks a lot for your feedback! Indeed, 32-bit Linux hosts hadn't been tested yet. After b6c43ac it has been fixed. As you mentioned, it required generating 32-bit objects with NASM and fixing some calling convention macros.

However, after successful installation of the kernel module, HAXM failed to run a 32-bit guest kernel, debian-9.5.0-i386-xfce-CD-1.iso (which runs fine on 64-bit Linux hosts with HAXM). The issue seems quite different others I've seen so far (error happens inside asm_vmxrun, i.e. the CPU itself is reporting the failure), so maybe it's a nested virtualization bug of the underlying hypervisor, in my case VMware. Logs:

[   65.146688] haxm_debug: vm entry!
[   65.146948] haxm_debug: HAX: VM entry failed: result=2 RIP=0000fff0
[   65.146958] haxm_info: HAX: Prev exit: 0 error code: 7
[   65.146960] haxm_info: HAX: VM entry failed
[   65.146961] haxm_info: end of cpu_vmentry_failed
[   65.146963] haxm_debug: cpu_vmx_run error, code:2lx

If you are running a 32-bit Linux distribution "bare metal", I would be really helpful if you could re-test this patch at b6c43ac and let me know whether you can replicate the issue (or whether it works).

@raphaelning Right now, some logs don't feel helpful enough (see messages above). I often find myself asking what's result=2, or error code: 7, or code:2lx. Ironically, functions that do indeed provide helpful feedback for debugging purposes, are disabled and/or annoyingly interleaved with code:

Would it be reasonable to uncomment and move such helper functions to a single file (maybe even the dump.c file that already exists)? Additionally, we could add helper functions to transform enum constants into strings, to print the meaning of error codes directly in the logs. What are your thoughts?

I'm asking just for the sake of discussion. If accepted, this proposal should be a separate PR. :-)

raphaelning commented 5 years ago

@AlexAltea The new commits look good, except that I haven't been able to get qemu-system-x86_64 -accel hax working on my Ubuntu 16.04:

Failed to open vm 0
Failed to create HAX VM
No accelerator found.
qemu-system-x86_64: failed to initialize HAX: Invalid argument

I'll look into the "invalid argument" error later. But if it rings a bell with you, please help :-)

I've already added my user to the haxm group, logged out and logged back in. Without the re-login step, I'd get a different error:

Failed to open the hax module
No accelerator found.
qemu-system-x86_64: failed to initialize HAX: No such device

I also noticed an error during sudo make install, but I'm not sure if that's related:

$ sudo make install
make -C /lib/modules/4.15.0-38-generic/build M=$PWD modules_install
make[1]: Entering directory '/usr/src/linux-headers-4.15.0-38-generic'
  INSTALL /home/sdkadmin/Code/haxm/platforms/linux/haxm.ko
At main.c:160:
- SSL error:02001002:system library:fopen:No such file or directory: bss_file.c:175
- SSL error:2006D080:BIO routines:BIO_new_file:no such file: bss_file.c:178
sign-file: certs/signing_key.pem: No such file or directory
  DEPMOD  4.15.0-38-generic
make[1]: Leaving directory '/usr/src/linux-headers-4.15.0-38-generic'
./haxm-install.sh

Lastly, I'm using the latest QEMU master, since it was pretty easy to rebase https://github.com/kryptoslogic/qemu/commit/48db0b4f0bcd952e09c3355c2b64948bf703da58 onto it. I'll try v2.12 later.

raphaelning commented 5 years ago

Afterwards, they will be able to launch qemu-system-x86_64 -accel hax as a regular user. All this information will be added to docs/manual-linux.md.

@raphaelning Sounds good? Proposal at dbfba2f.

Yes, the install/uninstall scripts are very handy. Please update the Linux manual accordingly.

Would it be reasonable to uncomment and move such helper functions to a single file (maybe even the dump.c file that already exists)? Additionally, we could add helper functions to transform enum constants into strings, to print the meaning of error codes directly in the logs. What are your thoughts?

Indeed there are quite a few problems with the HAXM fatal error dump/log, especially in terms of completeness and readability as you have pointed out. Both of your ideas should be great steps towards putting this right!

maronz commented 5 years ago

@AlexAltea thanks a lot for b6c43ac. I can confirm that I was able to successfully build the kernel module on a 32-bit "bare system". I don't actually use a "proper" installation, I just use the "live session" that comes ups when I boot the system (e.g. 'Linux Mint 19 Mate 32-bit') from a USB stick.

The HAXM driver installed, and I then started to "kick the tyres". Somehow, during this a bit unstructured testing I managed to produce a crash (as I could see in the 'dmesg' output). I then had the (stupid) idea to create a script, so that I'd be able to reproduce results this in a repeatable fashion. The script installs any additionally required packages, downloads the HAXM and QEMU sources, patches the latter, builds both and attempts to run a test.

Albeit now, when the script runs I did not see a crash, but rather essentially the same failure as @raphaelning mint@mint:~/qemu-3.0.0$ ./i386-softmmu/qemu-system-i386 -accel hax Failed to open vm 5 Failed to create HAX VM No accelerator found. qemu-system-i386: failed to initialize HAX: Invalid argument mint@mint:~/qemu-3.0.0$ ./i386-softmmu/qemu-system-i386 -accel hax Failed to open vm 6 Failed to create HAX VM No accelerator found. qemu-system-i386: failed to initialize HAX: Invalid argument (note the increment in the VM number)

Operating in such a "live session" allows me also to do things that might not be quite "kosher" on a "proper" system. In that spirit I then ran: sudo ./i386-softmmu/qemu-system-i386 -accel hax and that resulted in "crash" output in the 'dmesg', and a "hanging" process, which showed up in 'ps -efH' output as: root 23641 15067 0 22:14 pts/1 00:00:00 sudo ./i386-softmmu/qemu-system-i386 -accel hax root 23642 23641 0 22:14 pts/1 00:00:00 [qemu-system-i38] <defunct> and required a sudo kill -9 23642 to get rid of it.

So, in summary: the HAXM module does not (yet) work for me on a (32-bit) Linux system.

PS: If anyone sees any benefit in it, I still have the script and the 'dmesg' output, and be happy to share it.

raphaelning commented 5 years ago

I got the same "Failed to open vm" error after switching to https://github.com/kryptoslogic/qemu/commit/48db0b4f0bcd952e09c3355c2b64948bf703da58 (QEMU v2.12.1) :-( I haven't looked into it yet.

If you deem it necessary, I can cleanup the commit history before merging this branch.

That would be ideal, if it's not too much burden for you.

raphaelning commented 5 years ago

@maronz Ah interesting, so basically I just need to run QEMU with sudo? That's not ideal, but I'll give it a try tomorrow.

EDIT: And yes, I observe the increment in the VM number too.

AlexAltea commented 5 years ago

The new commits look good, except that I haven't been able to get qemu-system-x86_64 -accel hax working on my Ubuntu 16.04:
Failed to open vm 0
Failed to create HAX VM

@raphaelning @maronz Sorry, the HAXM VM/VCPU devices hadn't been assigned to the haxm group, That's why only running as root worked. I'll patch the driver to update ownership information accordingly.

The incrementing VM number is definitely a consequence of this issue, but I'll still investigate it further to ensure there's no logic issue: VM's are destroyed once the reference counter, decremented by hax_put_vm, reaches zero, so lingering VM's are definitely bad news.

I've already added my user to the haxm group, logged out and logged back in. Without the re-login step, I'd get a different error.

@raphaelning After adding the current user to any group, it's necessary to re-login. Otherwise, it cannot open /dev/HAX which results in the error you see. I've added the corresponding information in the manual. Alternatively, you can get a shell with updated groups without re-login via: exec su -l $USER.

If anyone sees any benefit in it, I still have the script and the 'dmesg' output, and be happy to share it.

@maronz It could be useful. If a crash ever occurs, it would be really helpful to share a dmesg snippet (filtering out any sensitive information).

I also noticed an error during sudo make install, but I'm not sure if that's related:
sign-file: certs/signing_key.pem: No such file or directory

@raphaelning The reason is that the Makefile's modules_install target (dependency of install target), attempts to sign the kernel module with the mentioned key pair, which despite being automatically generated during kernel build, they are not shipped with Linux distros for obvious reasons. This has resulted in several issues in many drivers.

If you don't have a key pair already, the solution is creating one manually:

cd /lib/modules/`uname -r`/build

Then, create the file: certs/x509.genkey, with the following contents (based on certs/Makefile):

[ req ]
default_bits = 4096
distinguished_name = req_distinguished_name
prompt = no
string_mask = utf8only
x509_extensions = myexts

[ req_distinguished_name ]
CN = Build time autogenerated kernel key

[ myexts ]
basicConstraints=critical,CA:FALSE
keyUsage=digitalSignature
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid

Finally, create the key pair with (based on certs/Makefile and /boot/config-* variables):

openssl req -new -nodes -utf8 -sha512 -days 36500 -batch -x509 -config certs/x509.genkey -outform PEM -out certs/signing_key.x509 -keyout certs/signing_key.pem

This should suffice to self-sign the kernel module. Of course, you will still get a "PKCS#7 signature not signed with a trusted key" warning in dmesg, unless the public key is trusted by the kernel.

I could add these steps to the manual, but right now Ubuntu 18.04 (and most distros?) haven't enabled a strict driver trusted signature enforcement yet. Even if enabled, for testing/debugging it's much simpler to disable it, for release/production it's much easier to give users pre-signed binaries, so I'm not sure if these lengthy instructions should be written in the manual.

raphaelning commented 5 years ago

@raphaelning @maronz Sorry, the HAXM VM/VCPU devices hadn't been assigned to the haxm group, That's why only running as root worked. I'll patch the driver to update ownership information accordingly.

I see, thanks! Indeed the lingering /dev/hax_vm/vmXX device nodes are owned by root:root. I checked the Darwin implementation, which calls devfs_make_node() to create these special files. That function conveniently asks for the owner, group and mode of the new file, and HAXM passes to it the current process's UID, GID and 0600. This piece of logic is missing from the current Linux implementation, which seems to rely on misc_register() to create the device node. I hope you can find a way to specify ownership information, either in the driver or with udev rules. But we could also argue that creating a device node for each VM and vCPU is not the best design in the first place, because those should only be visible to one process (QEMU) and should not consume system-wide resources...

The incrementing VM number is definitely a consequence of this issue, but I'll still investigate it further to ensure there's no logic issue: VM's are destroyed once the reference counter, decremented by hax_put_vm, reaches zero, so lingering VM's are definitely bad news.

I can confirm that running QEMU as root works :-) However, HAXM fails to clean up the VM and vCPU device nodes after I quit QEMU, so there's indeed a flaw in the reference counting mechanism. I think I've seen lingering device nodes on Mac as well, although very rarely (only when HAXM runs in to a fatal error). If this issue turns out to be outside the scope of this PR, we can merge this PR first.

raphaelning commented 5 years ago

I could add these steps to the manual, but right now Ubuntu 18.04 (and most distros?) haven't enabled a strict driver trusted signature enforcement yet. Even if enabled, for testing/debugging it's much simpler to disable it, for release/production it's much easier to give users pre-signed binaries, so I'm not sure if these lengthy instructions should be written in the manual.

Thanks for the detailed explanation! I see that signing the driver is not very useful at the moment, so let's not add these steps to the manual. However, since people might get confused by the sign-file error, how about simply noting in the manual that the error can be safely ignored?

maronz commented 5 years ago

I'm back again with another round of observations. And as it is a new day, I've decided to change the Linux distribution for today's testing to be Knoppix v8.2 (with either a 4.16.5 or a 4.16.5-64 kernel). For the "price" of a single ISO file download, one gets the option to boot either into a 32 or a 64-bit system. [ I've now come to the impression, that this distribution is a combination of two different kernels, but always with a 32-bit userland ].

(1) My inability to use '-accel hax' on a 32-bit (bare metal) Linux system continues. But I've finally noticed a message in the 'dmesg' output, when running 'sudo make install', that might help in the problem identification: [ 648.771322] haxm_error: NX is not enabled in the system, HAXM does not function. [ 648.771338] haxm_warning: -------- HAXM v7.3.2 Start --------

Initially I ran this on a 'i5-520M' CPU, of which I was never quite sure, whether HAXM would be supported. But I repeated the test on a 'i5-4300U', and the result was the same.

(2) I then moved on to repeat my test with the 64-bit kernel. There I ran into the problem that Knoppix v8.2, when booted in 64-bit mode, reports the following slightly unexpected results: knoppix@Microknoppix:~$ getconf LONG_BIT ; uname -m 32 x86_64

With this in mind, it is clear why 'haxm.ko' failed to be built, as NASM had produced 32-bit objects. I changed the architecture check in the 'Makefile' via: sed -i 's#LBITS#MACHINE# ; s#getconf LONG_BIT#uname -m# ; s#,64#,x86_64#' Makefile

This seems to have done the trick, and maybe that is a more generic method to determine the processor architecture. After the kernel module got built, the 'sudo make install' did not cause another message in the 'dmesg' output, and apart from the (current) need to run QEMU as super-user I was able to use '-accel hax' successfully (even on the old i5-520M).

raphaelning commented 5 years ago

[ 648.771322] haxm_error: NX is not enabled in the system, HAXM does not function.

@maronz Apparently your 32-bit Linux kernel doesn't enable the IA32_EFER.NXE (No-Execute Enable) flag, which makes sense if the kernel is built without PAE support (CONFIG_X86_PAE). I don't know why HAXM cares about NXE, but apparently, all host OSes that HAXM currently supports (including 32-bit Windows) enable it.

This error may not be fatal, because it doesn't really cause HAXM initialization to fail--instead, HAXM just proceeds in the normal code path after printing this error/warning. Therefore, it may not be the root cause of the crash you are seeing.

AlexAltea commented 5 years ago

This piece of logic is missing from the current Linux implementation, which seems to rely on misc_register() to create the device node.

@raphaelning Indeed. I'll find a way around that and report back once I fix the issue.

But we could also argue that creating a device node for each VM and vCPU is not the best design in the first place, because those should only be visible to one process (QEMU) and should not consume system-wide resources...

Agreed. Probably creating an anonymous file instance and returning the file descriptor to userland would work: it limits VM/VCPU access to QEMU. That's anon_inode_getfd on Linux. Hopefully, there's something similar in Windows/macOS. Anyway, this should be left for future patches since it's a big API change.

I think I've seen lingering device nodes on Mac as well, although very rarely

I might have found (part of?) the root cause while working on these patches since the code came straight from the haxm-darwin frontend. Some ioctl's exit without decrementing the reference counter (via hax_put_vcpu and hax_put_vm):

maronz commented 5 years ago

@AlexAltea : Quick question that I've missed to ask: which Linux distribution have you used for your testing of a 32-bit host? Instead of picking for my "bare metal" testing more or less randomly a distribution that might throw up new questions, I'd rather use one where, I might have a chance of success.

AlexAltea commented 5 years ago

There I ran into the problem that Knoppix v8.2, when booted in 64-bit mode, reports the following slightly unexpected results:

@maronz That's surprising. Thanks for providing the Makefile patch, that seems to be more reliable.

Quick question that I've missed to ask: which Linux distribution have you used for your testing of a 32-bit host?

I used Ubuntu 14.04, specifically ubuntu-14.04.5-desktop-i386.iso. Let me know you will see any noticeable difference with it. :-)

AlexAltea commented 5 years ago

Last issue fixed: Similarly to haxm-darwin, newly created VM/VCPU devices will be owned by the UID:GID of the calling process. This is done by looking-up the inode for the device path and modifying its attributes. The solution isn't ideal, but as mentioned earlier, it's the best we can do without API changes.

Current status of haxm-linux:

64-bit Linux hosts: builds and runs fine. No lingering issues.
32-bit Linux hosts: builds, but fails at VM-entry. I need extra help/time to fix it (info at https://github.com/intel/haxm/pull/108#issuecomment-435691965).

raphaelning commented 5 years ago

@AlexAltea Thanks! The new commits look good, and I'm now able to launch QEMU without sudo on Ubuntu 16.04 (x86_64) :-)

Some ioctl's exit without decrementing the reference counter (via hax_put_vcpu and hax_put_vm)

Good catch! All of these happen only in extreme circumstances. I may have run into one of them on Mac--I'm not sure, but I don't think they can explain the lingering device nodes issue with haxm-linux, which happens 100% of the time for me. In fact, this is what I get today after launching QEMU twice:

$ ll /dev/hax*
/dev/hax_vm:
total 0
drwxr-xr-x  2 root     root       80 Nov  7 11:42 ./
drwxr-xr-x 22 root     root     4440 Nov  7 11:42 ../
?rw-rw----  1 myuser   myuser      0 Nov  7 11:41 vm00
?rw-rw----  1 myuser   myuser      0 Nov  7 11:42 vm01

/dev/hax_vm00:
total 0
drwxr-xr-x  2 root     root       60 Nov  7 11:41 ./
drwxr-xr-x 22 root     root     4440 Nov  7 11:42 ../
?rw-rw----  1 myuser   myuser      0 Nov  7 11:41 vcpu00

/dev/hax_vm01:
total 0
drwxr-xr-x  2 root     root       60 Nov  7 11:42 ./
drwxr-xr-x 22 root     root     4440 Nov  7 11:42 ../
?rw-rw----  1 myuser   myuser      0 Nov  7 11:42 vcpu00

BTW, the vmXX and vcpuXX are printed in red.

And I see the following warnings in dmesg, which reveal one or two other issues:

[  137.841246] haxm_warning: hax_alloc_pages: HAX_MEM_LOW_4G is ignored
[  137.909685] haxm_warning: ramblock_free_chunks: chunks[0] existed but its bit in chunks_bitmap was not set: size=0x10000, block.size=0x10000
[  137.953487] haxm_warning: ramblock_free_chunks: chunks[0] existed but its bit in chunks_bitmap was not set: size=0x40000, block.size=0x40000
[  224.986280] haxm_warning: hax_alloc_pages: HAX_MEM_LOW_4G is ignored
[  225.052579] haxm_warning: ramblock_free_chunks: chunks[0] existed but its bit in chunks_bitmap was not set: size=0x10000, block.size=0x10000
[  225.093507] haxm_warning: ramblock_free_chunks: chunks[0] existed but its bit in chunks_bitmap was not set: size=0x40000, block.size=0x40000

Note that the usual ...........hax_teardown_vm warning is missing, which is further evidence that the HAXM VM object is never destroyed.

I don't think any of these issues should block the merge of this PR. I just want to make sure you are aware of these issues, in case you haven't seen them yourself, so that you can address them in a later PR.

maronz commented 5 years ago

The following has been observed (for a change) on a 64-bit Linux Mint 19, with a 4.15.0 kernel.

The '/dev/HAX' device is still getting created with 0600 permissions, and 'root:root' ownership, but correcting those (via: sudo chmod 660 /dev/HAX && sudo chown root:haxm /dev/HAX) allows to successfully start VMs with 'accel hax' without the use of 'sudo'.

After I've have started 8 VMs I'm out of luck, and I'm getting: Failed to create vm ffffffff Failed to create HAX VM No accelerator found. qemu-system-i386: failed to initialize HAX: Invalid argument

Even sudo make uninstall && sudo make install && sudo rm -r /dev/hax_vm* does not help, and any further attempt to start a VM ends up with: sysfs: cannot create duplicate filename '/devices/virtual/misc/hax_vm!vm00' in the 'dmesg' output (followed by some process dump).

AlexAltea commented 5 years ago

The '/dev/HAX' device is still getting created with 0600 permissions, and 'root:root' ownership, but correcting those (via: sudo chmod 660 /dev/HAX && sudo chown root:haxm /dev/HAX)

@maronz That's weird. Did that happen even after running sudo make install? Because the installation script will run those two chmod + chown commands. I can't replicate this issue.

After I've have started 8 VMs I'm out of luck, and I'm getting

This issue most likely has to do with the incrementing VM ID problem. I'll fix that one now.

AlexAltea commented 5 years ago

Linking issues on macOS are fixed as of f153b54. Then I rebased on top of master, which required fixing the Kbuild script on 12e4821.

Good catch! All of these happen only in extreme circumstances. I may have run into one of them on Mac--I'm not sure

Should I submit a PR to add those mising hax_put_vcpu and hax_put_vm calls?

As for the lingering device issues on Linux, I'll investigate that now, as well as the ramblock warnings and the reason why the VM/VCPU's are shown in red: it seems it's detecting the file with an unknown type (note the ? in ?rw-rw----, it should be c instead), maybe a side-effect of messing with the inode directly.

maronz commented 5 years ago

@AlexAltea Regarding the ownership/permission problem I've now found something of a work-around: after adding to 'haxm-install.sh' a final line of ls -l /dev/$DEVNAME the problem seems to "disappear" (at least on the one system, I've now done many iterations of testing).

Another option that appears to work is inserting a sleep 1 before the 'chown' command. If one puts that delay in between the 'chown' and the 'chmod' the latter gets done, but not the former. This last observation indicates to me, that some weird race condition might be in play here. I'm probably as dumbfounded as yourself, as I can't see anything wrong in the original script,

Anyway, the 'ls -l ...' is probably more useful, as one would get to see right at the end of the install script whether the device creation has worked, or not.

AlexAltea commented 5 years ago

@maronz Indeed, it sounds like a race condition. Maybe modprobe returns right after loading the module, but does not wait for the initialization function to finish, thus chmod+chown might be called before the HAXM device exists.

I've updated the installation script to wait up to 3 seconds (if necessary), which should be way more than enough time for the device to be created. Let me know if this fixes your issue.

AlexAltea commented 5 years ago

@raphaelning Regarding the 32-bit Linux host issues I found: Now with proper logging I see:

[   73.853623] haxm_warning: -------- HAXM v7.3.2 Start --------
[   98.103785] haxm_warning: hax_alloc_pages: HAX_MEM_LOW_4G is ignored
[   98.559302] haxm_error: VM entry failed: RIP=0000fff0
[   98.559311] haxm_error: VMfailValid. Prev exit: 0. Error code: 7 (VMX_ERROR_ENTRY_CTRL_FIELDS_INVALID)

The Intel SDM Vol. 3C claims at 30.4 VM Instruction Error Numbers that this code means:

VM entry with invalid control field(s) b,c

b. VM-entry checks on control fields and host-state fields may be performed in any order. Thus, an indication by error number of one cause does not imply that there are not also other errors. Different processors may give different error numbers for the same VMCS.

c. Error number 7 is not used for VM entries that return from SMM that fail due to invalid VM-execution control fields in the executive VMCS. Error number 25 is used for these cases.

Any clues as to debug which control field(s) might be invalid?

EDIT#1: I've added VMCS dumps for a 32-bit Windows host and a 32-bit Linux host, before and after the following snippet in cpu_vmx_run at cpu.c (both using the latest revision of the linux branch):

    result = asm_vmxrun(vcpu->state, vcpu->launched);

    vcpu->is_running = 0;
    vcpu_save_guest_state(vcpu);
    vcpu_load_host_state(vcpu);

These are the files: Does anything look suspicious?

EDIT#2: After diffing the files, aside from many expected differences in HOST_* fields I've noticed a mismatch in VMX_EXIT_CONTROLS:

-haxm_warning: 400c VMX_EXIT_CONTROLS: 36dff # windows
+haxm_warning: 400c VMX_EXIT_CONTROLS: 236fff # linux

maronz commented 5 years ago

@AlexAltea I'm afraid, I owe you an apology. Before my last comment (about the race condition in the install script) I did not read the script carefully enough. [ I guess it being late in the evening at my end did not help things. ]

After having done a quick test of a0b270b, and still (!!) noticing the permission/ownership issue, I've had another look at the script, and have now come to the conclusion that we've got a "cart before the horse" situation. I'm now suggesting to re-arrange the order of the critical steps into the following order:

create the group (if necessary),
create the 'udev' rule, and
then load the kernel module

With this sequence of events I'm now confident that any attempt of "fixing" the permission/ownership afterwards won't be required. The current order of processing is: 3,1,2 (i.e. load the module first, then set up the group and the 'udev' rule). And that seems to explains (in hindsight) what has been observed up to now.

I'm attaching a proposal of a re-worked 'haxm-install.sh' script for your perusal. haxm-install.sh.gz

AlexAltea commented 5 years ago

@maronz Thanks a lot for tracking down the issue! I've added your patched file and amended the commit.

raphaelning commented 5 years ago

After diffing the files, aside from many expected differences in HOST_* fields I've noticed a mismatch in VMX_EXIT_CONTROLS

Thanks for doing all the hard work to track this issue down to this simple diff! Apparently, the Linux driver incorrectly sets bit 9 (Host address-space size) and bit 21 (Load IA32_EFER), both which should be 0 for a 32-bit host OS (especially when the host kernel is 32-bit). With this information, it's very easy to identify the code that sets these bits:

https://github.com/intel/haxm/blob/24fdff5da8a15347463bfe12e35da56fbdd74cef/core/vcpu.c#L1360-L1367

As I've pointed out in the above comment, the solution is to make sure is_compatible() evaluates to 0 on Linux hosts.

AlexAltea commented 5 years ago

@raphaelning Thank you! At first, I was sceptical about is_compatible() being the culprit so I never bother changing it and testing. But that fixed indeed the problem. :-)

Now with 64-bit and 32-bit Linux hosts on the same grounds I'm going to fix the lingering VM/VCPU issues, after some checking I figured out I didn't implement open and release (i.e. close) file operations for them. I've already done it locally, which should fix the issue, but during hax_clear_vcpumem(cv->tunnel_vcpumem); there's a kernel panic happening, caused by vm_munmap.

I'll investigate why this happens and report back.

raphaelning commented 5 years ago

The new commits look good to me, but the lingering device nodes issue is somehow still not completely fixed:

(Before the test, I rebooted my computer and confirmed that /dev/hax* didn't exist.)

The first QEMU VM creates /dev/hax_vm/vm00 and /dev/hax_vm00/, but does not delete them in the end.
The second VM somehow manages to "reuse" vm00 instead of creating vm01. In the end, it doesn't delete vm00 either.
The third and fourth VMs are run in parallel. One of them reuses vm00, and the other creates vm01. In the end, we have both vm00 and vm01 nodes lingering around.

BTW, we have tested this PR (in its current state) on Windows and Mac without running into any issue, so we're ready to merge it. But if you want to add any Linux-specific patches, we can wait.

AlexAltea commented 5 years ago

@raphaelning Great! None of the remaining issues require changing core or Windows/Mac-specific code, so the tests won't be needed again. Once everything is ready from my side, I'll rewrite the commit history as:

Normalized prefixes of functions/types
Added support for Linux hosts
Added Linux to TravisCI build matrix
Manual for building/testing HAXM on Linux

the lingering device nodes issue is somehow still not completely fixed

This seems caused by the inode modification, which also caused the devices to be shown in red with unknown type ? (permissions ?rw-rw----). Reverting ceed27e fixes both issues, but makes sudo necessary again. I'll investigate and fix this.

raphaelning commented 5 years ago

@AlexAltea Sounds good, thanks! My previous comment coincided with https://github.com/intel/haxm/pull/108/commits/8f5956448c786b579e2a9ed9171444ecb609c47e, but I was testing without this commit. It looks like an important fix though.

raphaelning commented 5 years ago

I can confirm that https://github.com/intel/haxm/pull/108/commits/edcd0ef09ebf11687199117f827dc7d9d9f78d0c has eradicated lingering VM/vCPU device nodes, and those device nodes now show up with the correct file type (c) in ls -l output. Awesome!

BTW, in the final patch set, I'd like to see https://github.com/intel/haxm/pull/108/commits/d673697fbdb342edc4bc9d76d006e51493637872 ("Ensuring all VM/VCPU ioctl exit paths decrement the refcounter") remain a separate commit.

intel / haxm