hardkernel / linux

Linux kernel source tree
Other
427 stars 408 forks source link

LXD containers wont start on Odroid C1 Ubuntu Linux - Operation not permitted - failed to remove sys_time capability #269

Closed digitalspider closed 7 years ago

digitalspider commented 7 years ago

LXD: https://linuxcontainers.org/lxd/getting-started-cli/

When I try to start an lxd container on Ubuntu 16.04 I get the following error:

$ lxc start alpine
error: Error calling 'lxd forkstart alpine /var/lib/lxd/containers /var/log/lxd/alpine/lxc.conf': err='exit status 1'
  lxc 20160216130334.855 ERROR lxc_conf - conf.c:setup_caps:2138 - Operation not permitted - failed to remove sys_time capability
  lxc 20160216130334.856 ERROR lxc_conf - conf.c:lxc_setup:3973 - failed to drop capabilities
  lxc 20160216130334.856 ERROR lxc_start - start.c:do_start:811 - Failed to setup container "alpine".
  lxc 20160216130334.856 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 3)
  lxc 20160216130334.915 ERROR lxc_start - start.c:__lxc_start:1346 - Failed to spawn container "alpine".
  lxc 20160216130335.514 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 1.
  lxc 20160216130335.514 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "alpine".

My config:

# uname -a
Linux odr2 3.10.104-182 #1 SMP PREEMPT Tue Jan 31 23:12:12 UTC 2017 armv7l armv7l armv7l GNU/Linux
# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
...

After some research I found the issue is as per this:

The solution is as per:

Any chance this could be added to the next release of the Odroid linux kernel?

mdrjr commented 7 years ago

Hello @digitalspider Thank you for finding and debugging this. Next update will include the required patch.

digitalspider commented 7 years ago

Wow @mdrjr - thanks for fixing this so quick. Just want to add some more notes from my testing.

I upgraded my kernel by doing a "sudo apt full-upgrade", so that I'm now on 3.10.104-185

uname -a
Linux odr2 3.10.104-185 #1 SMP PREEMPT Fri Feb 24 08:35:43 UTC 2017 armv7l armv7l armv7l GNU/Linux

When I tried to start the container I got the following error "lxc.aa_allow_incomplete = 1"

error: Error calling 'lxd forkstart alpine /var/lib/lxd/containers /var/log/lxd/alpine/lxc.conf': err='exit status 1'
  lxc 20170302000847.836 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:220 - If you really want to start this container, set
  lxc 20170302000847.836 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - lxc.aa_allow_incomplete = 1
  lxc 20170302000847.836 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:222 - in your container configuration file
  lxc 20170302000847.837 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
  lxc 20170302000847.837 ERROR lxc_start - start.c:__lxc_start:1346 - Failed to spawn container "alpine".
  lxc 20170302000848.436 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 1.
  lxc 20170302000848.436 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "alpine".

I solved this by running the following:

lxc config set alpine raw.lxc 'lxc.aa_allow_incomplete = 1'
lxc start <container>

This worked and I could start my container, and have it running, but it could not make create an ip address.

Looking further into this on the container I got the error. Note: I'm running this as root:

~ # service networking start
 * Starting networking ...
 *   eth0 ...
udhcpc: socket(AF_INET,3,255): Operation not permitted                                                                                                  * ERROR: networking failed to start

# Just executing "udhcpc" gives the same error:
udhcpc: socket(AF_INET,3,255): Operation not permitted

Note: on my ubuntu system this works fine, and gives the following, but on odroid it's having an issue.

~ # udhcpc
udhcpc: started, v1.26.2
udhcpc: sending discover
udhcpc: sending select for 192.168.x.xxx
udhcpc: lease of 192.168.x.xxx obtained, lease time 86400

Still investigating the root cause...

tobetter commented 7 years ago

@digitalspider I'm curious the type of network interface what DHCP is getting IP address. If it's not virtual network interface to create NAT on ODROID, I suspect the configuration of your container whether it has the proper capability for the network. If the same problem doesn't happen in the host container, which means you can get IP address without failure, the problem might be the privilege of the container. Therefore, I would recommend that you look at the configuration 'lxc.cap.drop' or 'lxc.cap.keep' and may need to allow to have 'net_admin' capability to the container.

NanWang0024 commented 7 years ago

Has this issue been solved by the latest kernel? I am still getting this error when try to launch LXD on Ubuntu 14.04 with kernel 3.10.105+.

error: Error calling 'lxd forkstart test /var/lib/lxd/containers /var/log/lxd/test/lxc.conf': err='exit status 1'
  lxc 20170307135807.495 ERROR lxc_conf - conf.c:setup_caps:2138 - Operation not permitted - failed to remove sys_time capability
  lxc 20170307135807.495 ERROR lxc_conf - conf.c:lxc_setup:3973 - failed to drop capabilities
  lxc 20170307135807.495 ERROR lxc_start - start.c:do_start:811 - Failed to setup container "test".
  lxc 20170307135807.495 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 3)
  lxc 20170307135807.528 ERROR lxc_start - start.c:__lxc_start:1346 - Failed to spawn container "test".
  lxc 20170307135808.231 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 1.
  lxc 20170307135808.231 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "test".

My config:

# uname -a
Linux odroidxu3-5 3.10.105+ #1 SMP PREEMPT Thu Mar 2 15:01:45 GMT 2017 armv7l armv7l armv7l GNU/Linux

# cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.5 LTS, Trusty Tahr"
...
digitalspider commented 7 years ago

@NanWang0024 - I think this was only done for odroidc not odroidxu3. But check the source code in github, for that specific branch.

@tobetter - that was a great tip. Thank you. I am just using a normal wired network eth0, with lxd using a bridge lxdbr0, all default settings, only ipv4.

# cat /var/log/lxd/alpine/lxd.conf
lxc.cap.drop = sys_time sys_module sys_rawio mac_admin mac_override
lxc.mount.auto = proc:rw sys:rw cgroup:mixed
lxc.autodev = 1
lxc.pts = 1024
...
lxc.seccomp = /var/lib/lxd/security/seccomp/alpine
lxc.id_map = u 0 165536 65536
lxc.id_map = g 0 165536 65536
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxdbr0
lxc.network.hwaddr = 00:16:3e:47:bc:ff
lxc.network.name = eth0
lxc.rootfs.backend = dir
lxc.rootfs = /var/lib/lxd/containers/alpine/rootfs
lxc.mount.entry = /var/lib/lxd/shmounts/alpine dev/.lxd-mounts none bind,create=dir 0 0
lxc.aa_allow_incomplete = 1

I saw the following ticket, and though that maybe the version I was on 2.0.9 was too old:

So I updated to use the ppa:ubuntu-lxc/lxd-stable, as per https://launchpad.net/~ubuntu-lxc/+archive/ubuntu/lxd-stable, and now am on version 2.11.

It also means I know have the "lxc network" command.

lxc network list
+--------+----------+---------+---------+
|  NAME  |   TYPE   | MANAGED | USED BY |
+--------+----------+---------+---------+
| eth0   | physical | NO      | 0       |
+--------+----------+---------+---------+
| lxcbr0 | bridge   | NO      | 0       |
+--------+----------+---------+---------+
| lxdbr0 | bridge   | YES     | 1       |
+--------+----------+---------+---------+
root@odr2:/var/log/lxd/alpine# lxc network show lxdbr0
config:
  dns.mode: dynamic
  ipv4.address: 10.1.1.1/24
  ipv4.dhcp.ranges: 10.1.1.2-10.1.1.254
  ipv4.nat: "true"
  ipv6.address: none
name: lxdbr0
type: bridge
used_by:
- /1.0/containers/alpine
managed: true

As for the net_admin tip, I'm not that familiar with linux, but have read this:

I did try add net_admin, but it seems to have made no difference:

# vi input
lxc.aa_allow_incomplete = 1
lxc.cap.drop = sys_time sys_module sys_rawio mac_admin mac_override net_admin
# cat input | lxc config set alpine raw.lxc -
# lxc config show alpine  # I can see the new config there.
# lxc stop alpine
# lxc start alpine

More info about the network:

# ip link
8: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether fe:4b:ce:63:d9:d0 brd ff:ff:ff:ff:ff:ff
14: vethS04KHS: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master lxdbr0 state UP mode DEFAULT group default qlen 1000
    link/ether fe:4b:ce:63:d9:d0 brd ff:ff:ff:ff:ff:ff

Unfortunately still no IP address, although I have noticed a few warnings in the logs:

# lxc info --show-log alpine
Name: alpine
Remote: unix:/var/lib/lxd/unix.socket
Architecture: armv7l
Created: 2017/03/09 13:49 UTC
Status: Running
Type: persistent
Profiles: default
Pid: 12817
Ips:
  lo:   inet    127.0.0.1
  lo:   inet6   ::1
  eth0: inet6   fe80::216:3eff:fe47:bcff        veth5P59LW
Resources:
  Processes: 1
  CPU usage:
    CPU usage (in seconds): 6
  Memory usage:
    Memory (current): 396.00kB
    Memory (peak): 920.00kB
  Network usage:
    lo:
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0
    sit0:
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0
    eth0:
      Bytes received: 3.05kB
      Bytes sent: 468B
      Packets received: 23
      Packets sent: 6
    ip6tnl0:
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0

Log:
lxc 20170309135956.884 WARN     lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:218 - Incomplete AppArmor support in your kernel
lxc 20170309135956.886 WARN     lxc_start - start.c:signal_handler:322 - Invalid pid for SIGCHLD. Received pid 12811, expected pid 12817
digitalspider commented 7 years ago

Actually one other thing on the host. apparmor seems not to be installed correctly:

# apparmor_status
apparmor module is loaded.
Could not open /sys/kernel/security/apparmor/profiles: No such file or directory

Don't know why, but that folder does not exist on the file system, in my odroid armhf. It does exist on my amd64 box.

digitalspider commented 7 years ago

I think the problem is this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1560583

My gut feel now says that the above issue is preventing profile from being created. This would be why the default "lxc.cap.drop" is also different:

# head lxc.conf  (on amd64)
lxc.cap.drop = sys_time sys_module sys_rawio

# head lxc.conf (on armhf)
lxc.cap.drop = sys_time sys_module sys_rawio mac_admin mac_override

I think the last 2 need to be removed, but using lxc.raw I can only add...

Also see: http://forum.odroid.com/viewtopic.php?f=112&t=12410

tobetter commented 7 years ago

@digitalspider You are doing quite interesting things. :+1: Firstly, I would recommend you to try to build LXC source code by yourself instead of using prebuilt package. There are two flags for APPARMOR and SELINUX, you could run './configure --disable-apparmor --disable-selinux' with other options. This would disable the security code in LXC. So at least, I believe, that you won't have security problems to play your virtual container. Obviously, you would enable them or use the official package later. Secondly, I think you already have done to check the kernel parameters whether kernel is fully configured for LXC. There is a script 'lxc-config', you can check on the board so it read '/proc/config.gz' to parse kernel configuration. I hope it works for you.

digitalspider commented 7 years ago

Thanks @tobetter - will try. First time compiling a kernel. Fun fun. I've found some instructions in the last comment of this blog, which I will try and follow. Need to replace all the odroidxu3-3.10.y with odroidc-3.10.y

We'll see how this goes. Also I found the config, which seem to have all the right settings.

But will try this with the flags suggested, and see if I can experiment till I get this right. Thanks.