hermit-os / uhyve

A specialized hypervisor for Hermit.
Apache License 2.0
251 stars 29 forks source link

Support nested virtualization (for rusty-hermit) #6

Open jschwe opened 4 years ago

jschwe commented 4 years ago

When using nested virtualization rusty-hermit currently panics when detecting the cpu frequency, since all methods fail. Uhyve should provide the CPU frequency even in a nested virtualization environment. This can be done either by ensuring detect_from_hypervisor() works or by modifying the CPUid brandstring to contain the clockspeed.

Also (this still needs some more testing on my side though) uhyve should print error messages / quit in the same way for nested virtualization as it does for normal virtualization. Currently uhyve seems to print less when using nested virtualization.

jschwe commented 4 years ago

I'd like to add that The output of lscpu for both native and virtual ubuntu. Since all the necessary information is available in the virtualbox, uhyve should be able to provide the clockspeed to rusty-hermit. I did not really understand how detect_from_hypervisor() actually works yet, so I might be wrong about this.

I also double checked and uhyve drops error messages when running with nested virtualization. I compiled and inspected the same program with gdb, and verified that they both panic at the same .expect() line. The nested uhyve however doesn't output any clear error message. Is there a good way to debug uhyve itself with gdb?

lscpu native Ubuntu

``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz Stepping: 3 CPU MHz: 800.017 CPU max MHz: 3900,0000 CPU min MHz: 800,0000 BogoMIPS: 6999.82 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d ```

lscpu virtualized Ubuntu (kvm hypervisor)

``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz Stepping: 3 CPU MHz: 3503.998 BogoMIPS: 7007.99 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow flexpriority fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d ```

panic with uhyve running on virtual ubuntu

``` HERMIT_VERBOSE=1 ../uhyve/target/debug/uhyve target/x86_64-unknown-hermit/debug/rusty_demo [0][INFO] Welcome to HermitCore-rs 0.3.25 [0][INFO] Kernel starts at 0x200000 [0][INFO] BSS starts at 0x446600 [0][INFO] TLS starts at 0x444d60 (size 264 Bytes) [0][INFO] Total memory size: 64 MB [0][INFO] A pure Rust application is running on top of HermitCore! [0][INFO] Heap: size 54 MB, start address 0x600000 [0][INFO] Heap is located at 0x600000 -- 0x3c00000 (0 Bytes unmapped) [0][INFO] [0][INFO] ===================== PHYSICAL MEMORY FREE LIST ====================== [0][INFO] 0x00000003C00000 - 0x00000004000000 [0][INFO] ====================================================================== [0][INFO] [0][INFO] [0][INFO] ================== KERNEL VIRTUAL MEMORY FREE LIST =================== [0][INFO] 0x00000003C00000 - 0x00800000000000 [0][INFO] ====================================================================== [0][INFO] ERROR 2020-04-26T08:23:34Z: uhyve::linux::vcpu: Internal error ERROR 2020-04-26T08:23:34Z: uhyve: CPU 0 crashes! Unknown exit reason. ```

panic with uhyve running on native ubuntu

``` $ HERMIT_VERBOSE=1 ../uhyve/target/debug/uhyve target/x86_64-unknown-hermit/debug/rusty_demo [0][INFO] Welcome to HermitCore-rs 0.3.25 [0][INFO] Kernel starts at 0x200000 [0][INFO] BSS starts at 0x43f600 [0][INFO] TLS starts at 0x43d048 (size 264 Bytes) [0][INFO] Total memory size: 64 MB [0][INFO] A pure Rust application is running on top of HermitCore! [0][INFO] Heap: size 54 MB, start address 0x600000 [0][INFO] Heap is located at 0x600000 -- 0x3c00000 (0 Bytes unmapped) [0][INFO] [0][INFO] ===================== PHYSICAL MEMORY FREE LIST ====================== [0][INFO] 0x00000003C00000 - 0x00000004000000 [0][INFO] ====================================================================== [0][INFO] [0][INFO] [0][INFO] ================== KERNEL VIRTUAL MEMORY FREE LIST =================== [0][INFO] 0x00000003C00000 - 0x00800000000000 [0][INFO] ====================================================================== [0][INFO] thread '' panicked at 'Could not determine the processor frequency: ()', src/arch/x86_64/kernel/processor.rs:405:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace [0][INFO] Shutting down system ```

stlankes commented 4 years ago

Hm, I have similar setup and it work for me. Can you check if https://github.com/hermitcore/uhyve/blob/master/src/vm.rs#L683 determines the correct frequency on your system? Which Linux kernel do you use?

jschwe commented 4 years ago

On native Ubuntu freq is 3500, which is correct. On virtual Ubuntu (tested on my vagrant box) freq is 0. Uhyve should probably also give an error/info message for this similar to the None case and skip the write_volatile().

This behavior is expected I guess, since rusty-hermit on uhyve on native ubuntu also can't read out the processor frequency with the CpuId crate. This implies that CpuID doesn't work (in some) virtual environments.

Kernel version virtual Ubuntu (vagrant box): 4.15.0-96-generic This also has virtualbox guest additions installed Kernel version native Ubuntu: 5.3.0-46-generic

jschwe commented 4 years ago

https://github.com/hermitcore/uhyve/pull/9 works on my local machine. uyhve on virtual ubuntu can detect the CPU frequency on my computer.

However this doesn't work everywhere. For example it doesn't work on travis: https://travis-ci.com/github/jschwe/rusty-hermit/jobs/326283414 The problem here is that the Model name of the CPU is "Intel(R) Xeon(R) CPU" which doesn't contain the frequeny. Since Intel Xeon CPUs are very common for servers we should consider adding an additional method for parsing the CPU frequency.

When looking at the Job log you can also see that the original error reason which should have been "Could not determine the processor frequency" , due to the failed expect is not printed. This only happens on nested environments. It is also not completely consistent, since there where cases when I have seen the reason for a panic printed out on travis or in my virtualbox. I'll try to investigate this further. Worth noting in this context is that the following messages only appeared after I added some debug printlns to libhermit, without changing anything else.

[0][TRACE] __sys_malloc: allocate memory at 0x600180 (size 0x408, align 0x8)
[0][ERROR] Page Fault (#PF) Exception: ExceptionStackFrame {
    instruction_pointer: 0x37569b,
    code_segment: 0x8,
    cpu_flags: 0x10283,
    stack_pointer: 0x1ffb90,
    stack_segment: 0x10,
}
[0][ERROR] virtual_address = 0x3830000, page fault error = The fault was caused by a non-present page.
The access causing the fault was a read.
The access causing the fault originated when the processor was executing in supervisor mode.
The fault was not caused by reserved bit violation.
The fault was not caused by an instruction fetch.
[0][ERROR] fs = 0x0, gs = 0x44A678

Before adding the prints the error contained much less info Adding the first prints changed the error

You can view the changes I made here: https://github.com/hermitcore/rusty-hermit/pull/5 I basically only added debug outputs and the travis pipeline.

jschwe commented 4 years ago

I've now also tested this with my second travis pipeline. When run without the debug prints I get the following error:

thread '<unnamed>' panicked at 'attempt to create unaligned or null slice', /home/travis/build/jschwe/hermit/rust/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/src/rust/src/libcore/slice/mod.rs:5694:5

stack backtrace:

With the added debug prints I actually get the expected error message this time:

thread '<unnamed>' panicked at 'Could not determine the processor frequency: ()', src/arch/x86_64/kernel/processor.rs:419:9

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

[0][INFO] Shutting down system

However either the kernel or uhyve doesn't terminate correctly. It hangs and is terminated by travis after 10 minutes. I recall having seen this behaviour about two or three times locally too.

Something strange is definitely going on here. Could there be some kind of race condition in the panic handler?

stlankes commented 4 years ago

@jschwe Can you check, if hermit core/libhermit-rs#48 determines the CPU frequency correctly on test setup.

jschwe commented 4 years ago

@stlankes I checked, and this does not determine the CPU frequency on travis. . It might work if we use the cpuid function from uhyve, so I'll test that when I have time and write an update here.

jschwe commented 4 years ago

Update: Using this method in uhyve also doesn't work on travis. raw_cpuid is able to detect that the hypervisor is kvm, but returns a frequency of 0.

stlankes commented 4 years ago

Do you still receive sometimes a page fault?

jschwe commented 4 years ago

Currently I'm not experiencing any panics, so I am not experiencing any page faults when running rusty-demo. However I believe there still is an issue with the panic_handler, since I can still reproduce the Page fault when deliberately panicking (https://github.com/hermitcore/libhermit-rs/issues/43)