rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.45k stars 656 forks source link

Passing autoformat as boot parameter causes kernel panic #1243

Open bachp opened 8 years ago

bachp commented 8 years ago

RancherOS Version: 0.6.1

Where are you running RancherOS? (bare-metal and virtualbox)

When passing the following parameters rancher.state.dev=LABEL=RANCHER_STATE rancher.state.autoformat=[/dev/sda] at boot time with an empty /dev/sda I get a kernel panic.

I tried multiple bare-metal machines and a VirtualBox VM using PXE as described in the manual.

Here is the panic message from the VirtualBox VM. image

The issue only occurs if an autoformat takes place. It doesn't happen if the state partition is already there and it doesn't happen either if the rancher.state.dev=LABEL=RANCHER_STATE parameter is omitted.

joshwget commented 8 years ago

It looks like the PXE docs were missing the rancher.state.formatzero flag. I updated the page to include this. If you add this flag to your setup then this should resolve the issue for you.

At the same time, it's definitely a bug that a kernel panic happens in this scenario.

bachp commented 8 years ago

@joshwget Thanks I will give it a try

bachp commented 8 years ago

@joshwget After adding rancher.state.formatzero I'm still getting the kernel panic.

joshwget commented 8 years ago

Is the device in rancher.state.autoformat wrong perhaps? This was something I accidentally did when I was testing this yesterday.

bachp commented 8 years ago

I used

rancher.state.dev=LABEL=RANCHER_STATE rancher.state.autoformat=[/dev/sda] rancher.state.formatzero rancher.cloud_init.datasources=['url:http://ipxe.lan/cloud-config']

If I boot with only

rancher.cloud_init.datasources=['url:http://ipxe.lan/cloud-config']

and then access /dev/sda I can see it and it contains no partitions or filesystems (I used wipefs before)

joshwget commented 8 years ago

The kernel panic here should be fixed by #1252. In v0.7.0 it's likely that state will still fail to be mounted, but at least we'll be able to boot RancherOS running from memory. This will make it easier to troubleshoot.

If you'd like, you can try this out with v0.7.0-rc2.

ChromoX commented 8 years ago

I'm having this exact issue as well.

I also don't see a v0.7.0-rc2 in the releases directory.

joshwget commented 8 years ago

@ChromoX It's on our releases page.

ChromoX commented 8 years ago

Is there anyway to hook this up to iPXE easily? iPXE doesn't seem to support https urls.

joshwget commented 8 years ago

Github supports HTTP for downloads, so you can just switch it to http:// for testing.

ChromoX commented 8 years ago

So I have RancherOS 0.7.0-rc2 loading up, but autoformat doesn't work.

[root@compute3 ~]# sudo ros --version
ros version v0.7.0-rc2

My cloud-config contains:

rancher:
  state:
    fstype: auto
    formatzero: true
    dev: LABEL=RANCHER_STATE
    autoformat:
      - /dev/sda
      - /dev/sdb

If I try to run the os-autoformat image:

[root@compute3 ~]# sudo system-docker run rancher/os-autoformat:v0.7.0-rc2
+ MAGIC='boot2docker, please format-me'
+ DEVS=(${AUTOFORMAT})

I've confirmed with dd that the first 1MB of the disks is 0s.

If I try to run the os-cloudinit image:

[root@compute3 ~]# sudo system-docker run rancher/os-cloudinit:v0.7.0-rc2 
system-docker: Error response from daemon: Container command '/usr/bin/ros' not found or does not exist..

The ${AUTOFORMAT} environment variable has no devices listed. Putting a device path and re-running the docker image does not change anything.

Also it doesn't seem that you guys support auto-formatting multiple devices? https://github.com/rancher/os/blob/e58a6a54333bf87a42296bd08a51dcece3b00074/images/02-autoformat/auto-format.sh#L61

Any ideas?

joshwget commented 8 years ago

@ChromoX How are you setting your cloud-config? rancher.autoformat is designed to be used as a kernel parameter.

joshwget commented 8 years ago

@bachp I confirmed that a kernel panic no longer occurs in v0.7.0-rc4. I'm going to leave this issue open since we still haven't resolved why RancherOS was unable to mount state in your case.

ChromoX commented 8 years ago

@joshwget You were right. Wasn't using it as a kernel parameter. Thanks!

joshwget commented 8 years ago

We'll probably support autoformat as part of a cloud-config at some point in the future after #1175.

joshwget commented 7 years ago

@bachp Can you try this on v0.7.0 when you have the chance?

bachp commented 7 years ago

@joshwget I retried with v0.7.0. But I'm unable to perform autoformat. I tried the instructions at: http://docs.rancher.com/os/running-rancheros/server/pxe/ and I also tried adding rancher.state.formatzero.

I also tried to add add autoformat to cloud-init as described here: http://docs.rancher.com/os/storage/state-partition/ again without luck.

I used wipefs -a /dev/sda on the disk before. Also I see an "autoformat" job starting but nothing seems to happen.

janeczku commented 7 years ago

Same here. With 0.7.0 autoformat does not work when booting via IPXE. Kernel params: rancher.state.dev=LABEL=RANCHER_STATE rancher.state.autoformat=[/dev/vda] Couldn't find anything interesting in the debug logs other than this:

boot

Verified that disk is zeroed:

sudo od -A d -N 1048576 /dev/vda | head -n 3

0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
1048576
joshwget commented 7 years ago

These seem like two separate issues. For @bachp the system is booting but autoformat isn't running.

@janeczku Do you have any other kernel parameters?

bachp commented 7 years ago

@joshwget I'm also booting via iPXE. So the issue might be related to the one reported by @janeczku . I used the parameters as described here: http://docs.rancher.com/os/running-rancheros/server/pxe/ no additional kernel args.

janeczku commented 7 years ago

@joshwget

These seem like two separate issues. For @bachp the system is booting but autoformat isn't running.

Same issue for me. The system boots normally. But no state partition is created.

$ cat /proc/cmdline
rancher.state.dev=LABEL=RANCHER_STATE rancher.state.autoformat=[/dev/vda] rancher.cloud_init.datasources=[ec2]
SvenDowideit commented 7 years ago

in my case, the new, un-formatted, new disk wasn't all zeros, so the autoformat didn't trigger. for my testing use, I think I'd like a rancher.state.autoformat.onboot=true

user-name-is-taken commented 4 years ago

I got this when I set sudo ros config syslinux, set the parameters, then reboot. I'd been messing with the parameters before that though.