rancher / old-vm

(OBSOLETE) Package and Run Virtual Machines as Docker Containers
Apache License 2.0
645 stars 133 forks source link

Creating vm results in pending state and error in controller #114

Open NicolaiSchmid opened 5 years ago

NicolaiSchmid commented 5 years ago

Hey, I've just setup a brand new installation of Kubernetes and RancherVM. I deployed the latest version a36f4ce24ef6d23f2f2cd10e7c41a750322518d2 and created a ubuntu-server vm through the web interface. Since the creation it is stuck in a pending state. As does the Kubernetes description of the vm say:

kubectl describe virtualmachines.vm.rancher.io test-vm
Name:         test-vm
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  vm.rancher.io/v1alpha1
Kind:         VirtualMachine
Metadata:
  Creation Timestamp:  2018-08-25T21:19:23Z
  Finalizers:
    deletion.vm.rancher.io
  Generation:        1
  Resource Version:  28056
  Self Link:         /apis/vm.rancher.io/v1alpha1/virtualmachines/test-vm
  UID:               89aec1c3-a8ac-11e8-823a-c8f7337ecd45
Spec:
  Action:          start
  Cpus:            1
  Hosted _ Novnc:  false
  Image:           rancher/vm-ubuntu:16.04.4-server-amd64
  Memory _ Mb:     512
  Node _ Name:     
  Public _ Keys:
    james
Status:
  Id:              i-89aec1c3
  Ip:              
  Mac:             06:fe:89:ae:c1:c3
  Node _ Ip:       10.13.42.71
  Node _ Name:     wh84hs7idph
  State:           pending
  Vnc _ Endpoint:  
Events:            <none>

The controller did log an error message that the vm could not be updated:

W0825 21:19:23.456462       1 controller.go:279] error updating vm pod default/test-vm: Operation cannot be fulfilled on virtualmachines.vm.rancher.io "test-vm": the object has been modified; please apply your changes to the latest version and try again

I've not made any changes to the vm since clicking the create button in the webinterface, and my machine should have sufficient resources for creating a vm. Right now I'm kind of clueless what could cause this error and how to repair the system. Any ideas? Or even a possible fix on my end? Thanks, Nicolai

makomatic commented 5 years ago

I see exactly the same. How can we help debugging this?

LLParse commented 5 years ago

Hi @NicolaiSchmid @makomatic, the observed warning is probably a red herring, unless it's happening continuously. Something's definitely not right, though. Could you please try fetching the VM pod via kubectl get pod/test-vm-xxxxxxxx -o yaml. If the container is starting, please post some logs from this pod too.

twitchax commented 5 years ago

I am having the same issue.

Here are my vm pod logs.

+ : ens33 
+ : br0 
+ : 8192 
+ : 4 
+ : 06:fe:fb:77:3c:e7 
+ : false 
+ : 4444 
+ : -drive 'file=$KVM_IMAGE,if=none,id=drive-disk0,format=qcow2' -device virtio-blk-pci,scsi=off,drive=drive-disk0,id=virtio-disk0,bootindex=1 
+ : -drive 'file=$KVM_IMAGE,if=none,id=drive-disk0,format=raw' -device virtio-blk-pci,scsi=off,drive=drive-disk0,id=virtio-disk0,bootindex=1 
+ : -netdev 'bridge,br=$BRIDGE_IFACE,id=net0' -device 'virtio-net-pci,netdev=net0,mac=$MAC' 
+ '[' '' = bash ']' 
+ KVM_ARGS= 
+ '[' -e /dev/vm/root ']' 
+ BASE_IMAGE_DIR_LIST=(`ls /base_image`) 
++ ls /base_image 
+ '[' 1 -ne 1 ']' 
+ '[' '!' -d /image ']' 
+ KVM_IMAGE=/image/sda.qcow2 
+ '[' '!' -f /image/sda.qcow2 ']' 
+ VOLUMES_DIR=/volumes/ 
++ find /volumes/ -name '*.img' 
++ sort -d 
find: '/volumes/': No such file or directory 
+ VOLUMES_LIST= 
+ extra_kvm_blk_opts= 
+ for volume in '$VOLUMES_LIST' '/dev/vm/disk*' 
+ '[' -e '/dev/vm/disk*' ']' 
+ KVM_BLK_OPTS='-drive file=$KVM_IMAGE,if=none,id=drive-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,drive=drive-disk0,id=virtio-disk0,bootindex=1' 
+ setup_bridge ens33 br0 
+ target_iface=ens33 
+ bridge_iface=br0 
+ must_exist ifconfig 
+ which ifconfig 
+ '[' 0 '!=' 0 ']' 
+ must_exist brctl 
+ which brctl 
+ '[' 0 '!=' 0 ']' 
+ must_exist route 
+ which route 
+ '[' 0 '!=' 0 ']' 
+ ifconfig ens33 
+ '[' 1 '!=' 0 ']' 
+ echo 'Target interface ens33 does not exist' 
+ exit 1 
Target interface ens33 does not exist 

Looks like a bridge or networking issue? The host OS is Rancher OS, so I am not sure if that changes the equation.

LLParse commented 5 years ago

@twitchax We've never tried using RancherOS as the host OS, typically we've used Ubuntu LTS. I'm not sure if bridge networking is even possible in RancherOS. Here's a doc on setting up the networking: https://github.com/rancher/vm/blob/master/docs/networking.md

The error you see indicates a networking issue. The interface name to be bridged must be named appropriately in the deployment manifest: https://github.com/rancher/vm/blob/master/deploy/ranchervm.yaml#L143. Configuring networking is a manual process for the moment.

twitchax commented 5 years ago

@LLParse, interesting.

I set up the bridge, but it is still complaining about ens33 for some reason.

Also, do you guys have some docs on developing the frontend? I'd like to make a PR that is more descriptive when the create fails with 400 Bad Request.

twitchax commented 5 years ago

Ahhh, it turns out that the ens33 interface is hard coded in the vm-controller args. If I change that to my device, eth0, it works fine. Also, the bridging is set up automatically.

When you guys make this a helm chart, you may want to make that an option.

LLParse commented 5 years ago

@twitchax Thanks for the feedback! RancherVM v0.2.0 is released. There's a Rancher chart available https://github.com/LLParse/charts-rancher/tree/ranchervm-charts/proposed/ranchervm/latest

Although I haven't tried, the chart should work with Helm with one exception: the frontend service type will need to change to NodePort or LoadBalancer to expose the UI outside the cluster. Rancher has a feature that exposes the service behind a proxy and thereby requires authentication with Rancher to access.

twitchax commented 5 years ago

@LLParse, awesome: thanks for the update!