mikma / lxd-openwrt

Scripts for building LXD images from OpenWrt rootfs tarballs.
MIT License
150 stars 39 forks source link

The container shuts down instead of rebooting when running "reboot now" #14

Closed rkkoszewski closed 5 years ago

rkkoszewski commented 5 years ago

Hi, I'm trying to reboot the OpenWrt container from inside the container, but it shuts down permanently rather than restart. Rebooting is especially useful with the watchdog plugin, that allows to restart the router when an event happens or just to perform a periodical reboot of the router.

I have tested with an Alpine Linux container and a Debian 9 container and in both cases when I run "reboot now" inside the container, the container reboots and starts again without any issues.

I guess that the init or procd process needs to somehow signal the parent lxc monitor process that it is trying to reboot rather than to shut down (Via ACPI?). Looking at htop when rebooting a Alpine or Debian container, I was able to observe that the whole init process of the container also shuts down, which seems like it is no "hacky soft reboot", but it is the lxc process that properly restarts the container there.

I'm running the container in LXC 3.1.0.

EDIT:

Some potential information: http://man7.org/linux/man-pages/man2/reboot.2.html (Behavior inside PID namespaces)

EDIT 2:

When I kill "procd" with signal SIGBUS the container reboots successfully. Maybe the issue is with the reboot command?

mikma commented 5 years ago

Removing the patch 0003-docker-fix-problem-stopping-container.patch seems to solve the problem for me. Can you confirm?

The patch has been included by upstream and is needed for running in docker. Hopefully it's possible to improve it to make it work in both lxc/lxd and docker.

rkkoszewski commented 5 years ago

Hi @mikma, thanks for looking into this. I had a look at the upstream patch: https://git.openwrt.org/?p=project/procd.git;a=commitdiff;h=832369078d818d19ab64051fdc8da9e06c90ad88

I think it must be because of the missing reboot event when running from a container. An idea would be to add:

reboot(reboot_event);

Before the

exit(0);

(It should not trigger a kernel panic) I will test that out tomorrow.

EDIT:

Just tested the change and reboot is working now. Will submit a PR in a moment. Shutdown also still works as expected. This should also work fine for Docker, but I have not tested it.

MateEke commented 5 years ago

This issue is still present for me. I have built a new image, made a test container, but reboot still shuts down the container permanently.

What I did:

./build.sh -p "luci-theme-material luci-app-adblock luci-app-ddns luci-app-wol iptables-mod-checksum"
lxc image import bin/openwrt-18.06.4-x86-64-lxd.tar.gz --alias openwrt-18.06.4
lxc launch openwrt-18.06.4 router-test -c security.privileged="true"
lxc exec router-test passwd root

Then I tried to reboot from inside the container (reboot now command and Luci interface)

I'm using the latest version of your script:

git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Container config:

lxc config show router-test
architecture: x86_64
config:
  image.architecture: x86_64
  image.description: OpenWrt 18.06.4 r7808-ef686b7292
  image.os: OpenWrt
  image.release: 18.06.4
  security.privileged: "true"
  volatile.base_image: b548b330fc144bc4d4f07e3fe4469edf8839aaa71cb17f81207ed55be24a788f
  volatile.eth0.hwaddr: 00:16:3e:a3:3f:a9
  volatile.idmap.base: "0"
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
ephemeral: false
profiles:
- default
stateful: false
description: ""
lxc --version
3.0.3

EDIT:

I have tested the Alpine image, the reboot now command works fine in it.

EDIT 2:

I have removed the security.privileged: "true" setting and updated to lxc 3.16 but still no success.

mikma commented 5 years ago

I have built a new image, made a test container, but reboot still shuts down the container permanently.

It should work. Have you tried deleting bin/, build_dir/ and dl/ or starting from a fresh git clone? Changes to the patches won't automatically cause the procd package to be rebuilt, which means you may use a procd package built from an older version of the patches.

MateEke commented 5 years ago

It should work. Have you tried deleting bin/, build_dir/ and dl/ or starting from a fresh git clone? Changes to the patches won't automatically cause the procd package to be rebuilt, which means you may use a procd package built from an older version of the patches.

Thank you for the quick reply! Yes it works now, I have figured it out myself that the problem is the cached procd package. I wanted to add another edit to my comment, but i haven't got IPV4 connectivity because of the messed up if statement (db471ef). :)