rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 655 forks source link

Latest rancherOS ISO freeze at boot time on virtualbox #389

Closed ArKam closed 9 years ago

ArKam commented 9 years ago

Hi guys,

I'm currently setting up a RancherOS host on VBox v4.3.28 from the latest RancherOS iso available on your website https://releases.rancher.com/os/latest/rancheros.iso (v0.3.1) at this time.

I do want to setup this host for a demo of what RancherOS can achieve and how it could help us to renew our production infrastructure (around 100 hosts from AWS and BareMetal).

However, I'm facing a quite strange error involving 9Pnet device. After installing the iso to /dev/sda everything seems OK and rancherOS ask for reboot. Once rebooted the host hang indefinitely with the following statement:

9pnet: Could not find request transport: virtio.

the used cloud-config.yml is available at the following URL: https://github.com/ArKam/seed/blob/master/cloud-config.yml

So, could you help me regarding this error?

mbettan commented 9 years ago

Do you think the issue is my cloud-init or ESXi 5.5 hypervisors support not fully working at this time?

colinwilson commented 9 years ago

I'm experiencing the same behaviour. Install completes without error.

VMware ESXi 5.5 U3. My cloud-config.yml file.

v0.3.3 (hangs indefinitely) rancheros-v0 3 3-fail-esxi-5 5

v0.4.0-rc11 (stuck in a reboot loop) rancheros-v0 4 0-rc11-fail-esxi-5 5

EDIT: just noticed an error in my cloud-config.yml. Will try again and post results.

wicadmin commented 9 years ago

same issue here. ESXi 6

ramcheros-9p

Ping works, but ssh doesn't. I am really surprised to know that the team developing this have not tested on ESXi. That and other hypervisor is where the market is at.

#cloud-config 

ssh_authorized_keys:
 - ssh-rsa AAAA..........wROhBFK12+6APJvQ==

hostname: rancheros-01

rancher:
  network:
    dns:
      nameservers:
      - 192.168.211.1
    interfaces:
      eth*:
        dhcp: false
      eth0:
        address: 192.168.211.10/24
        gateway: 192.168.211.1
        mtu: 1500
ibuildthecloud commented 9 years ago

@2devnull I just tested on VMware Workstation 11 and ESXi 6 and both worked fine for me. Are you sure the SSH key is correct and you are logging in with the "rancher" user? If ping works then networking came up.

Regarding testing on VMware, let me clarify. We don't regularly test on ESXi. We have yet to find an issue in which something works on Workstation/Fusion but does not work on ESXi. For that reason we don't run an ESXi environment for testing but usually just test on Workstation/Fusion. We have users running on VMware today w/ no issues.

wicadmin commented 9 years ago

thanks for the clarification and for testing on ESXi 6. Can you let me see what the cloud-config file you used looks like or what a "good" clould-config file is supposed to look like. In addition, was this testing with 0.3.3?

I am doing:

ssh rancher@192.168.211.10

and getting:

ssh: connect to host 192.168.211.10 port 22: Connection refused
mbettan commented 9 years ago

My 2 cents - if your target is an Enterprise grade solution for running dockers on dev/qa/stagging/prod, you should consider ESX/Hyper-V/KVM/Cloud Publics in your testing plans. There are many differences between Fusion/Workstation and Hypervisors. If your target is only developers on their laptop, then you should also consider vagrant testing in addition to fusion/workstation.

Can you share your cloud-init file ? How did you make it working ? I understood that you solved this blocking issue with wrong cloud-init file with current beta bulds, it's not the case right?

Thanks

ibuildthecloud commented 9 years ago

@mbettan I hope I didn't give you the impression that we only test VMware Workstation/Fusion. Trust me we know all the complexities of hypervisors. What we currently test is kvm, VMware fusion/workstation, esxi, Xhyve, virtual box, EC2, GCE, Azure, hyper-v, xen. Unofficially also vultr, linode, digital ocean. Plus various baremetal configurations. Also small form factors devices such as minnowboard Max.

I realize now when I just tested esxi 6 just now I didn't install to disk. I will try that tomorrow. Maybe there is an issue with the adapter drivers. VMware has various options for storage so I'll try them out.

Since I booted from the ISO I did not have a cloud config file.

mbettan commented 9 years ago

Booted from the ISO was working for me on ESXi 5.5, the issue is related to install to disk

wicadmin commented 9 years ago

@ibuildthecloud were you able to test this with installing to disk and detected the issue?

imikushin commented 9 years ago

I've installed RancherOS v0.4.0-rc11 to disk and booted just fine on ESXi 6 hypervisor on the latest VMware Fusion Pro. I'm going to investigate this a little more. At the very least, expect a writeup on working configuration.

One important point right now: make sure you provide enough memory to RancherOS VM. 1024MB should work.

On Thu, Oct 22, 2015, 21:59 2devnull notifications@github.com wrote:

@ibuildthecloud https://github.com/ibuildthecloud were you able to test this with installing to disk and detected the issue?

— Reply to this email directly or view it on GitHub https://github.com/rancher/os/issues/389#issuecomment-150291114.

sheng-liang commented 9 years ago

I have no problem getting 0.4.0-rc11 running on ESXi 6.0 Update 1. I was able to get network access.

wicadmin commented 9 years ago

@sheng-liang - wonderful!

two things:

Thanks

deniseschannon commented 9 years ago

@2devnull The iso is located on our releases page. https://github.com/rancher/os/releases/tag/v0.4.0-rc11

mbettan commented 9 years ago

If I understand correctly, our issue is related to bad cloud config init.

What are the improvements for troubleshooting with rc11? I didn't get what changed with last release.

Do you have in the roadmap any standalone tool or pre-check to validate the cloud init file before installing to disk?

Can you share your cloud init file?

sheng-liang commented 9 years ago

@2devnull Your cloud config file looks good to me. I modeled mine after yours:

cloud-config.yml

wicadmin commented 9 years ago

@sheng-liang @deniseschannon - thank you for the requested info.

What I find is that if I change the type of Guest Operating System (I had 64-bit "other" OS) as my selection originally. I changed it to 64-bit then 32-bit "linux" and seem to have got further, still it got stuck at another point (see attached). I guess my question is, what is the correct Guest Operating System to use for rancherOS?

rancher2

sheng-liang commented 9 years ago

I select Linux and then "Other 3.x Linux (64 bit)"

I don't think this selection really matters that much though.

Are you able to boot up and login using default rancher user and get network access? You should not have to setup cloud-config.yml.

wicadmin commented 9 years ago

0.4.0 rc11 worked. I gave it 1GB memory and 1GB disk using linux other 64-bit as the VM type. Used the same cloud config file I used before. No change. I did not try 0.3.3 again as I am hoping that 0.4 will soon be the stable one?

on another note and probably this isn't the place to ask these questions but:

Thanks for your help.

wicadmin commented 9 years ago

BTW - will be nice to get VMWare Tools running as a system service

e.g.: https://hub.docker.com/r/sergeyzh/vmware-tools/

sheng-liang commented 9 years ago

@2devnull Answers below:

how many cores should I give to this VM (i.e. that it will actually use)?

It will use as many cores as you give.

what is the recommended way to create shared storage which all the instances can read/write to concurrently? I have only a single disk that has ESXi installed and now RancherOS.

RancherOS from 4.0 will support NFS client. That's the best way to support shared storage.

wicadmin commented 9 years ago

RancherOS from 4.0 will support NFS client. That's the best way to support shared storage.

You mean 0.4.0, not 4.0 right?

BTW - is there a publicly available roadmap?

sheng-liang commented 9 years ago

@2devnull Right 0.4.0.

AFAIK, There's no roadmap beyond the list of open issues.

wicadmin commented 9 years ago

so since I'm on 0.4.0 I should have the capability of NFS client now....correct?

Now, RancherOS will have the NFS client, so should I only allocate enough disk for RancherOS to store its Docker images and containers and setup another VM say running Debian to export the NFS shares from the allocated bulk disk space (i.e. NFS server) to be used as shared storage?