Open rowlandwatkins opened 6 years ago
Hi @rowlandwatkins , there are two things to check:
$ capstan config print
--- global configuration
CAPSTAN_ROOT: /home/miha/.capstan
CAPSTAN_REPO_URL: https://mikelangelo-capstan.s3.amazonaws.com/
CAPSTAN_DISABLE_KVM: false # <--------------------------------
CAPSTAN_QEMU_AIO_TYPE: native # <--------------------------------
Please see this document about how to configure these parameters, but basically you just need to create $HOME/.capstan/config.yaml
file with following content:
disable_kvm: false
qemu_aio_type: native
However, please make sure that KVM acceleration (which is basically vmx that you've mentioned) is really supported:
$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
and that your QEMU supports native
aio type (see this PR).
What step is slow for you, capstan package compose
?
Hi @miha-plesko yeah I'm noticing a slowdown compared to the old Capstan. Here are the command line arguments:
Old Capstan:
/usr/bin/qemu-system-x86_64 [-nographic -m 1024 -smp 2 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0 -drive file=/home/rowland/.capstan/instances/qemu/blah/disk.qcow2,if=none,id=hd0,aio=native,cache.direct=off,cache=writeback -device virtio-rng-pci -chardev stdio,mux=on,id=stdio,signal=off -device isa-serial,chardev=stdio -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 -device virtio-net-pci,netdev=un0 -chardev socket,id=charmonitor,path=/home/rowland/.capstan/instances/qemu/blah/osv.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -enable-kvm -cpu host,+x2apic]
New Capstan:
/usr/bin/qemu-system-x86_64 -nographic -m 1024 -smp 2 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0 -drive file=/home/rowland/.capstan/instances/qemu/bobunikernel/disk.qcow2,if=none,id=hd0,aio=native,cache=none -device virtio-rng-pci -chardev stdio,mux=on,id=stdio,signal=off -device isa-serial,chardev=stdio -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 -device virtio-net-pci,netdev=un0 -chardev socket,id=charmonitor,path=/home/rowland/.capstan/instances/qemu/bobunikernel/osv.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
One other thing worth mentioning is that I'm running all this in nester KVM. This might be why --enable-kvm is in the old capstan. When we build images on jenkins it's also dog slow, but we know that AWS doesn't support KVM so we disable it.
Cheers,
Rowland
Hey @rowlandwatkins , as you say, the most notable difference between the two commands above is -enable-kvm
switch. I can confirm that QEMU operates horribly slow when KVM acceleration is not supported - so could you please try to turn it on?
When I run image with new Capstan, I can see that the -enable-kvm
switch is turned on as well:
/usr/bin/qemu-system-x86_64 -nographic -m 1024 -smp 2 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0 -drive file=/home/miha/.capstan/instances/qemu/demo/disk.qcow2,if=none,id=hd0,aio=native,cache=none -device virtio-rng-pci -chardev stdio,mux=on,id=stdio,signal=off -device isa-serial,chardev=stdio -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 -device virtio-net-pci,netdev=un0 -chardev socket,id=charmonitor,path=/home/miha/.capstan/instances/qemu/demo/osv.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -enable-kvm -cpu host,+x2apic
So could you please try to either provide $HOME/.capstan/config.yaml
(see my comment above) or specify following environment variable:
$ export DISABLE_KVM=false
$ capstan run myunikernel
Doing this, the -enable-kvm
switch should appear and things should work horse fast š Naturally, this won't work on the AWS, if nested KVM is not supported there. But still - I see no reason why old Capstan would work any faster than the new one, because they literally use the same command underneath.
@rowlandwatkins in case you are interested, there is a way to speed up your Jenkins build on AWS, but it's not really tested yet (I've only tested it locally, see this PR and this thread). In short, you can run OSv unikernels natively on AWS instead inside Jenkins VM - this way KVM acceleration will be present.
"capstan compose-remote" approach could be used like this:
capstan package compose myunikernel
command with capstan package compose-remote IP
where IP is the IP of the OSv EC2 instanceAnd viola! You have your OSv unikernel composed natively on the AWS. You will need to restart it in order for new bootcmd to be run.
Now the odd thing with this approach is that AWS interaction is not yet integrated into Casptan (upload AMI image, boot EC2 instace out of it, reboot EC2 instance...) so you'd need to automate this by yourself (e.g. with Jenkins). I'd be glad to support you, but please bear in mind I'm doing it in my spare time.
BTW: Here is some documentation for preparing base unikernel: compose-remote-base, and for contextualizing it remotely: compose-remote.
@wkozaczuk - did you happen to test compose-remote approach on AWS? I only tested it locally and it worked.
Hi @miha-plesko strange, I modified $HOME/.capstan/config.yaml
, perhaps I need to restart my shell. I'll also try setting the env variable as you suggest.
Regarding your idea on AWS - we currently do something similar - we have a horrific groovy script which manages AMI lifecycle on EC2. We build a custom OSv image, copy it into an ebs blockstore, snapshot it, then create an AMI, then start the AMI, set DNS etc (including all the code to poll for readiness). One reason for doing this is that I'm not keen on modifying a running image - I'd like to have Jenkins tag image versions so we can handle changes and effectively make each image immutable. We can replace an image in nearly under 10 mins, which isn't bad. I'll take a look at the remote approach, if anything, we may like to snapshot after capstan package compose-remote IP
, then create another AMI from that.
You can use capstan config print
to print current Capstan configuration into console
Yes I did test it on AWS. I specifically came up with this mode to speed up deploying to EC2 instance.
The idea was to have base AMI (could include JRE if you deploy java apps) with OSv cpiod enabled and then use it every time to create app specific AMIs. I discovered that taking a snapshot of EC2 instance created this way is much faster (under a minute) - I can send you some specifics later.
Waldek
Sent from my iPhone
On Feb 7, 2018, at 09:04, Miha PleŔko notifications@github.com wrote:
You can use capstan config print to print current Capstan configuration into console
ā You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@wkozaczuk very cool, I'll try this approach then. Currently, snapshots take over a minute, so will be nice to try this route. In particular, this will remove the need to copy an entire base image to ebs before snapshotting, saving several minutes needed to create, copy and then delete the ebs.
@miha-plesko Found the issue with KVM acceleration detection - you assume kvm-ok exists (Ubuntu) has it, but Arch-derivatives don't. I just nabbed the bash script from launchpad and it worked. You might want to update the docs to reflect the need for kvm-ok (cpu-checker in Ubuntu), but not sure how many other distros keep it in their repos.
Cheers,
Rowland
@miha-plesko Found the issue with KVM acceleration detection - you assume kvm-ok exists (Ubuntu) has it, but Arch-derivatives don't. I just nabbed the bash script from launchpad and it worked. You might want to update the docs to reflect the need for kvm-ok (cpu-checker in Ubuntu), but not sure how many other distros keep it in their repos.
Cheers,
Rowland
hehe, my problem now is that the VM doesn't boot ;) Just hangs on "Booting from disk..."
Hm, that's strange, I didn't know we have this kvm-ok
dependency in code - I've opened an issue for that.
Unfortunately I have no idea what prevents your unikernel from starting, but is it possible that KVM actually isn't enabled? Could you try to verify that by other means than kvm-ok, e.g.:
$ cat /proc/cpuinfo | grep vmx
# should yield at least one CPU with vmx flag
See this article. Looking forward to be fixing this problem!
Ok, so digging further, KVM appears to be fragged - not sure if itās my kernel or my distro (Artix). If I turn off cpu counters it gets upset about msrs, but otherwise it crashes on pretty much everything which is frustrating. Cpuinfo has vmx listed so the capabilities are there, but itās also possible qemu has changed a lot too (previously using 2.5.0, now using 2.11.x).
@rowlandwatkins can you please try setting aio to "threads"? I remember we had some problems with aio "native" even on QEMU 2.5.0, that's why we now set it to "threads" by default now. Just put this into your $HOME/.capstan/config.yaml:
qemu_aio_type: threads
Hi @miha-plesko alas no improvement, all diagnostics suggest that KVM acceleration should be available, but it's clearly not working. I'll try a clean VM tomorrow to test nested KVM again.
Soooo, I've torn through several Artix, Devuan and Ubuntu installs to come to the conclusion that something is seriously wrong with qemu/kvm on recent kernels.
My current symptom is running capstan package compose somekernel --pull-missing just freezes on "setting cmdline:: --norandom
KVM claims to work and it did prior to updating to Ubuntu 17.04 - capstan run somekernel now just freezes qemu, requiring a killall.
Do you folks know any way to find out where qemu is failing?
current configuration: VMware workstation 14.1.1 build-7528167 Ubuntu 17.04 Qemu 2.10.1 Linux kernel 4.13.0-36-generic
Does adding capstan package compose ... -v
flag yield any more logs?
@miha-plesko I see the list of .capstanignore folders but that's it - qemu stops working whether I tell it to use kvm or not
OK, so a fresh 16.04 with qemu 2.5.0 works fine....
super bizarre
@gasper-vrhovsek I think you managed to run Capstan on Ubuntu 17.04, right?
@rowlandwatkins can you please show us your capstan config print
?
Alas, no, it seems my earlier successes were actually 16.04 LTS. I then upgraded to 17.04 for bug fixes, spectre, etc.
The bigger issues it seems in the qemu 2.10.x line revolves around the need to set cpu counters when doing nested virtualization. Qemu 2.5.0 doesnāt throw an error complaining about failed to set msrs - only by passing through the cpu counters in qemu 2.10.x does it stop failing but then silently fails forever on doing anything useful. This happens on Artix, Devuan and Ubuntu, so perhaps less a kernel issue, but this seems to be a combinatorial packaging issue.
Iām currently upgrading my 16.04 test vm to 17.04 to validate my assumption regarding qemu 2.10.x. If it does fail in a reliable fashion, Iāll try upgrading again to 17.10 to see if qemu 2.11.x acts any differently.
Apologies, misunderstood who you were addressing regarding qemu under 17.04! Iām curious if Gasper has a similar setup under VMware...
Hi @rowlandwatkins @miha-plesko i think i was still on 16.10 at the time, and for me the problem was solved with the qemu_aio_type fix @miha-plesko suggested earlier (https://github.com/mikelangelo-project/capstan/pull/77). Hadn't yet tried on 17.04.
Hi @gasper-vrhovsek, thanks for the tip, Iāll also try changing the qemu_aio_type on 17.04 and see what happens.
@rowlandwatkins if you run
$ capstan config print
you will see in console if you have this enabled or not.
@miha-plesko, yeah I think I do have ānativeā activated. Iāll try removing from the config first, and test again in the morning. Not sure if this will remove the cpu counters issue, but at least we can rule out if aio is misbehaving.
Right, the plot thickens...
Running capstan 0.3.0 on Ubuntu 17.10 causes the msr failure:
kvm.c:1797:kvm_put_msrs: Assertion ret == cpu->kvm_msr_buf->nmsrs failed
Qemu version: Debian 1:2.10+dfsg-0ubuntu3.5
According to Ubuntu package lists, 2.11 is slated for Bionic, which I think is 18.04.
Now, activating "virtualise CPU performance counters" removes the above assertion error, but leaves qemu forever hanging.
So, 17.10 upgrade won't help, this looks more like a qemu issue, perhaps I'll even run a custom version because of this...
OK, I think we have a winner!
See: https://bugs.launchpad.net/qemu/+bug/1636217
Environment: 1) nested virtualisation on VMWare (could be VMWare specific) 2) qemu 2.7+ 3) Virtio PCI
There appears to be a qemu bug when using Virtio disk drive. The above link gives a rundown, specifically affecting KVM-accel in nested virtualisation. It looks like this was partially caused by changes to SeaBIOS, but doesn't always help. The following solutions exist:
a) don't use KVM b) add "-machine type=pc-i440fx-x" where x <= 2.6 (I guess this then uses a different bios?) c) with pci device "disable-modern=on"
Is there any way with the new capstan to add arbitrary qemu options?
I'm currently using option c (qemu 2.10.1) and capstan works just like using qemu 2.5.0.
EDIT: the reason why qemu was hanging on boot was due to the bios not behaving with virtio - the result is no boot device, so seabios just sits there wondering what to do next...
Cheers!
I've modified my local capstan to use the "-machine" switch and now vm building and running work correctly!
We want PullRequest or it didn't happen! š
(But thans for diving into this, I'm so happy you managed to get it to work!)
@miha-plesko Well this is a interesting question: how do we want to patch it? I don't know how many others use nested virtualisation while using kvm on newer versions of qemu. Do you guys want it as a conditional? I really don't know the extent of the issue or whether setting -machine on qemu < 2.7 would be an issue.
All I've done is modify hypervisor/qemu/qemu.go:322
args = append(args, "-machine", "type=pc-i440fx-2.6")
Cheers
Thanks for the exact diff, now I can easily integrate it myself (unless you want to have official contribution on Capstan repository ;). I think introducing a new option qemu_machine
in .capstan/config.yaml:
qemu_machine_type: pc-i440fx-2.6
would be best. And if user specifies nothing, then no -machine
flag would be added. This way we have all systems covered, even if not automatically.
My pleasure, capstan and osv are great projects, and have helped me a lot. This particular issue has been plaguing me for some time, so it's nice to have a solution that stops me fire fighting. Cool, qemu_machine_type
sounds good :)
Maybe also add some blurb to the wiki to help others in a similar position:
For those using nested virtualisation in VMWare, be aware that for qemu versions > 2.6.0 there are some virtio Virtual Disk issues when run with KVM acceleration: 1) In Virtual Machine Settings | Processors, set
Virtualize Intel VT-x/EPT
, andVirtualize CPU Performance Counters
2) In .capstan/config.yaml setqemu_machine_type: pc-i440fx-2.6
With the above modifications, generating and running OSv image should be painless and fast.
Hi folks,
Is there any way I can add the vmx option to qemu's commandline arguments? Running under qemu at present is rather sloooow....
Cheers,
Rowland