kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

failure when selinux-enabled #784

Closed vbatts closed 3 years ago

vbatts commented 6 years ago

Description of problem

running docker with selinux enabled (/etc/docker/daemon.json of "selinux-enabled": true,) and on a centos7 host with selinux enabled.

Expected result

a shell

Actual result

[root@infra0 ~]# docker run -it --rm --runtime=kata-runtime fedora bash
docker: Error response from daemon: OCI runtime create failed: rpc error: code = Unknown desc = selinux label is specified in config, but selinux is disabled or not supported: unknown.

Meta details

Running kata-collect-data.sh version 1.3.0-rc1 (commit 22aedc4) at 2018-09-25.04:51:31.258811160-0400.


Runtime is /bin/kata-runtime.

kata-env

Output of "/bin/kata-runtime kata-env":

[Meta]
  Version = "1.0.15"

[Runtime]
  Debug = false
  Path = "/usr/bin/kata-runtime"
  [Runtime.Version]
    Semver = "1.3.0-rc1"
    Commit = "22aedc4"
    OCI = "1.0.1"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  Msize9p = 8192
  Debug = false
  UseVSock = false

[Image]
  Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.3.0-rc1_agent_1ee972176ae.img"

[Kernel]
  Path = "/usr/share/kata-containers/vmlinuz-4.14.67.11-137.1.container"
  Parameters = ""

[Initrd]
  Path = ""

[Proxy]
  Type = "kataProxy"
  Version = "kata-proxy version 1.3.0-rc1-981fef4"
  Path = "/usr/libexec/kata-containers/kata-proxy"
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.3.0-rc1-9b2891c"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = false

[Agent]
  Type = "kata"

[Host]
  Kernel = "3.10.0-862.11.6.el7.x86_64"
  Architecture = "amd64"
  VMContainerCapable = true
  SupportVSocks = false
  [Host.Distro]
    Name = "CentOS Linux"
    Version = "7"
  [Host.CPU]
    Vendor = "AuthenticAMD"
    Model = "AMD A4-5000 APU with Radeon(TM) HD Graphics"

Runtime config files

Runtime default config files

/etc/kata-containers/configuration.toml
/usr/share/defaults/kata-containers/configuration.toml

Runtime config file contents

Config file /etc/kata-containers/configuration.toml not found Output of "cat "/usr/share/defaults/kata-containers/configuration.toml"":

# Copyright (c) 2017-2018 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "cli/config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Kata Containers
# XXX:   Type: kata

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/kata-containers/vmlinuz.container"
image = "/usr/share/kata-containers/kata-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per SB/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0             --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when
# the actual number of physical cores is greater than it.
# WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU
# the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs
# can be added to a SB/VM, but the memory footprint will be big. Another example, with
# `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
default_maxvcpus = 0

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per SB/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for SB/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or 
# virtio-blk.
block_device_driver = "virtio-scsi"

# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
#
enable_iothreads = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

# This is the msize used for 9p shares. It is the number of bytes 
# used for 9p packet payload.
#msize_9p = 8192

# If true and vsocks are supported, use vsocks to communicate directly
# with the agent and no proxy is started, otherwise use unix
# sockets and start a proxy to communicate with the agent.
# Default false
#use_vsock = true

# VFIO devices are hotplugged on a bridge by default. 
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on 
# a bridge. This value is valid for "pc" machine type.
# Default false
#hotplug_vfio_on_root_bus = true

[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Default false
#enable_template = true

[proxy.kata]
path = "/usr/libexec/kata-containers/kata-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.kata]
path = "/usr/libexec/kata-containers/kata-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[agent.kata]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="macvtap"

# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true

Image details

---
osbuilder:
  url: "https://github.com/kata-containers/osbuilder"
  version: "unknown"
rootfs-creation-time: "2018-09-13T22:26:12.472554196+0000Z"
description: "osbuilder rootfs"
file-format-version: "0.0.2"
architecture: "x86_64"
base-distro:
  name: "Clear"
  version: "25000"
  packages:
    default:
      - "iptables-bin"
      - "libudev0-shim"
      - "systemd"
    extra:

agent:
  url: "https://github.com/kata-containers/agent"
  name: "kata-agent"
  version: "1.3.0-rc1-1ee972176ae437bcace0a37227818c506bb64ba9"
  agent-is-init-daemon: "no"

Initrd details

No initrd


Logfiles

Runtime logs

No recent runtime problems found in system journal.

Proxy logs

No recent proxy problems found in system journal.

Shim logs

No recent shim problems found in system journal.


Container manager details

Have docker

Docker

Output of "docker version":

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of "docker info":

Containers: 16
 Running: 10
 Paused: 0
 Stopped: 6
Images: 135
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: kata-runtime railcar runc runnc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 3.10.0-862.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.075GiB
Name: infra0.batts.lan
ID: DFJE:5L3J:GCJG:A6I3:3RH7:FQKG:B4SO:SGDZ:SEDN:RYMY:EBG6:Q3RH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 92
 Goroutines: 97
 System Time: 2018-09-25T04:51:32.479918891-04:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 infra0.batts.lan:5000
 regulator.batts.lan:5000
 127.0.0.0/8
Live Restore Enabled: false

Output of "systemctl show docker":

Type=notify
Restart=on-failure
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=0
TimeoutStopUSec=1min 30s
WatchdogUSec=0
WatchdogTimestamp=Tue 2018-09-25 04:43:51 EDT
WatchdogTimestampMonotonic=2488936406806
StartLimitInterval=60000000
StartLimitBurst=3
StartLimitAction=none
FailureAction=none
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=28302
ControlPID=0
FileDescriptorStoreMax=0
StatusErrno=0
Result=success
ExecMainStartTimestamp=Tue 2018-09-25 04:43:10 EDT
ExecMainStartTimestampMonotonic=2488895299462
ExecMainExitTimestampMonotonic=0
ExecMainPID=28302
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd ; ignore_errors=no ; start_time=[Tue 2018-09-25 04:43:10 EDT] ; stop_time=[n/a] ; pid=28302 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/docker.service
MemoryCurrent=103497728
TasksCurrent=206
Delegate=yes
CPUAccounting=no
CPUShares=18446744073709551615
StartupCPUShares=18446744073709551615
CPUQuotaPerSecUSec=infinity
BlockIOAccounting=no
BlockIOWeight=18446744073709551615
StartupBlockIOWeight=18446744073709551615
MemoryAccounting=no
MemoryLimit=18446744073709551615
DevicePolicy=auto
TasksAccounting=no
TasksMax=18446744073709551615
UMask=0022
LimitCPU=18446744073709551615
LimitFSIZE=18446744073709551615
LimitDATA=18446744073709551615
LimitSTACK=18446744073709551615
LimitCORE=18446744073709551615
LimitRSS=18446744073709551615
LimitNOFILE=18446744073709551615
LimitAS=18446744073709551615
LimitNPROC=18446744073709551615
LimitMEMLOCK=65536
LimitLOCKS=18446744073709551615
LimitSIGPENDING=28633
LimitMSGQUEUE=819200
LimitNICE=0
LimitRTPRIO=0
LimitRTTIME=18446744073709551615
OOMScoreAdjust=0
Nice=0
IOScheduling=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SecureBits=0
CapabilityBoundingSet=18446744073709551615
AmbientCapabilities=0
MountFlags=0
PrivateTmp=no
PrivateNetwork=no
PrivateDevices=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
RuntimeDirectoryMode=0755
KillMode=process
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=docker.service
Names=docker.service
Requires=basic.target
Wants=network-online.target system.slice
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=multi-user.target shutdown.target
After=systemd-journald.socket system.slice network-online.target basic.target firewalld.service
Documentation=https://docs.docker.com
Description=Docker Application Container Engine
LoadState=loaded
ActiveState=active
SubState=running
FragmentPath=/usr/lib/systemd/system/docker.service
UnitFileState=enabled
UnitFilePreset=disabled
InactiveExitTimestamp=Tue 2018-09-25 04:43:10 EDT
InactiveExitTimestampMonotonic=2488895299653
ActiveEnterTimestamp=Tue 2018-09-25 04:43:51 EDT
ActiveEnterTimestampMonotonic=2488936407049
ActiveExitTimestamp=Tue 2018-09-25 04:42:58 EDT
ActiveExitTimestampMonotonic=2488882917613
InactiveEnterTimestamp=Tue 2018-09-25 04:43:10 EDT
InactiveEnterTimestampMonotonic=2488895270128
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
IgnoreOnSnapshot=no
NeedDaemonReload=no
JobTimeoutUSec=0
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Tue 2018-09-25 04:43:10 EDT
ConditionTimestampMonotonic=2488895295959
AssertTimestamp=Tue 2018-09-25 04:43:10 EDT
AssertTimestampMonotonic=2488895295960
Transient=no

No kubectl


Packages

No dpkg Have rpm Output of "rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-containers-image|linux-container|qemu-)"":

qemu-lite-data-2.11.0+git.f886228056-50.1.x86_64
kata-proxy-bin-1.3.0~rc1+git.981fef4-34.1.x86_64
qemu-vanilla-2.11.2+git.0982a56a55-48.1.x86_64
kata-containers-image-1.3.0~rc1-34.1.x86_64
qemu-vanilla-bin-2.11.2+git.0982a56a55-48.1.x86_64
kata-shim-bin-1.3.0~rc1+git.9b2891c-35.1.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
qemu-vanilla-data-2.11.2+git.0982a56a55-48.1.x86_64
qemu-lite-2.11.0+git.f886228056-50.1.x86_64
kata-proxy-1.3.0~rc1+git.981fef4-34.1.x86_64
kata-runtime-1.3.0~rc1+git.22aedc4-49.1.x86_64
qemu-img-1.5.3-156.el7_5.5.x86_64
qemu-kvm-common-1.5.3-156.el7_5.5.x86_64
qemu-lite-bin-2.11.0+git.f886228056-50.1.x86_64
kata-linux-container-4.14.67.11-137.1.x86_64
qemu-kvm-1.5.3-156.el7_5.5.x86_64
kata-shim-1.3.0~rc1+git.9b2891c-35.1.x86_64
libvirt-daemon-driver-qemu-3.9.0-14.el7_5.7.x86_64

grahamwhaley commented 6 years ago

Hi @vbatts This is a known missing feature right now, which has been discussed recently, but there indeed seems to be little evidence or trail here on github. For a start, I would expect to find an entry in https://github.com/kata-containers/documentation/blob/master/Limitations.md (/me makes note...). There was some discussion when this https://github.com/kata-containers/documentation/issues/222 was being tried.

@xzr - do you remember if/where we got to with selinux kata discussion?

xzr commented 6 years ago

@grahamwhaley I think the consensus was that "someone will implement it when they have a chance" :P

grahamwhaley commented 6 years ago

Well, I created https://github.com/kata-containers/documentation/pull/253, but it is very slim. I had a look around at the docker docs and a few bits of the code (dockerd, runc), but I am finding it hard to locate anything meaty or definitive on how this is currently handled (architecturally for instance). Any pointers on what that setting actually enables and where most welcome @vbatts ;-), so we can use those in future 'how and where do we enable this' discussions.

vbatts commented 6 years ago

the most of it is ensuring that the filesystem view of the container and the execution of commands are done in the correct selinux context. @rhatdan can give more pointers. Some host filesystems handle the selinux context better than others (i.e. btrfs doesn't have "native" support, so it requires a recursive restorecon -R which causes a copy-up). Some bits will be different since the container execution is inside qemu, thankfully there is precedent of running qemu on a selinux enabled host (i don't have links to this off-hand). The other piece not discussed above is that this is only just to execute the container from the host. This is not getting into having the guest kernel/system being selinux enabled. That's an adventure for another day.

c3d commented 4 years ago

Same issue with podman.

Workaround: podman run --security-opt label=disable

rhatdan commented 4 years ago

Could you give me the AVC messages you are seeing?

c3d commented 4 years ago

@rhatdan Here it is with log level set to info:

# getenforce
Enforcing

# podman run --log-level=info --runtime /usr/bin/kata-runtime -it alpine sh
WARN[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled 
INFO[0001] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
INFO[0001] Got pod network &{Name:dazzling_lehmann Namespace:dazzling_lehmann ID:8b2afe344bfc47388be27bdfe2515f784421f8b6dbff6095c2989d05e7231dd6 NetNS:/var/run/netns/cni-bb772265-24b4-6a97-2720-7bc9aa8de714 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0001] About to add CNI network cni-loopback (type=loopback) 
INFO[0001] Got pod network &{Name:dazzling_lehmann Namespace:dazzling_lehmann ID:8b2afe344bfc47388be27bdfe2515f784421f8b6dbff6095c2989d05e7231dd6 NetNS:/var/run/netns/cni-bb772265-24b4-6a97-2720-7bc9aa8de714 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0001] About to add CNI network podman (type=bridge) 
INFO[0002] Running conmon under slice machine.slice and unitName libpod-conmon-8b2afe344bfc47388be27bdfe2515f784421f8b6dbff6095c2989d05e7231dd6.scope 
INFO[0007] Got pod network &{Name:dazzling_lehmann Namespace:dazzling_lehmann ID:8b2afe344bfc47388be27bdfe2515f784421f8b6dbff6095c2989d05e7231dd6 NetNS:/var/run/netns/cni-bb772265-24b4-6a97-2720-7bc9aa8de714 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0007] About to del CNI network podman (type=bridge) 
Error: rpc error: code = Unknown desc = selinux label is specified in config, but selinux is disabled or not supported: OCI runtime error

For now, I have interpreted this as "not supported"

c3d commented 4 years ago

@rhatdan Here is what I can find in audit.log:


type=ANOM_PROMISCUOUS msg=audit(1575020565.541:2970): dev=vethd7d77f80 prom=256 old_prom=0 auid=0 uid=0 gid=0 ses=10AUID="root" UID="root" GID="root"
type=NETFILTER_CFG msg=audit(1575020565.561:2971): table=nat family=2 entries=138
type=NETFILTER_CFG msg=audit(1575020565.563:2972): table=nat family=2 entries=140
type=NETFILTER_CFG msg=audit(1575020565.566:2973): table=nat family=2 entries=141
type=NETFILTER_CFG msg=audit(1575020565.569:2974): table=nat family=2 entries=142
type=NETFILTER_CFG msg=audit(1575020565.590:2975): table=filter family=2 entries=255
type=NETFILTER_CFG msg=audit(1575020565.593:2976): table=filter family=2 entries=256
type=NETFILTER_CFG msg=audit(1575020568.818:2977): table=filter family=2 entries=257
type=NETFILTER_CFG msg=audit(1575020568.819:2978): table=filter family=2 entries=256
type=NETFILTER_CFG msg=audit(1575020568.827:2979): table=nat family=2 entries=143
type=NETFILTER_CFG msg=audit(1575020568.829:2980): table=nat family=2 entries=145
type=NETFILTER_CFG msg=audit(1575020568.830:2981): table=nat family=2 entries=143
type=NETFILTER_CFG msg=audit(1575020568.833:2982): table=nat family=2 entries=145
type=NETFILTER_CFG msg=audit(1575020568.834:2983): table=nat family=10 entries=133
type=NETFILTER_CFG msg=audit(1575020568.837:2984): table=nat family=10 entries=135
type=NETFILTER_CFG msg=audit(1575020568.838:2985): table=nat family=10 entries=133
type=NETFILTER_CFG msg=audit(1575020568.840:2986): table=nat family=10 entries=135
type=ANOM_PROMISCUOUS msg=audit(1575020568.854:2987): dev=vethd7d77f80 prom=0 old_prom=256 auid=0 uid=0 gid=0 ses=10AUID="root" UID="root" GID="root"
type=NETFILTER_CFG msg=audit(1575020568.865:2988): table=nat family=2 entries=143
type=NETFILTER_CFG msg=audit(1575020568.873:2989): table=nat family=2 entries=142
type=NETFILTER_CFG msg=audit(1575020568.875:2990): table=nat family=2 entries=140
rhatdan commented 4 years ago

Looks like kata has to be rebuilt with SELinux support. Or at least to ignore the label when built without SELinux support.

grahamwhaley commented 4 years ago

@rhatdan - as noted earlier in the thread, the kata limitations document says kata does not currently support the selinux option. It's not quite as simple as rebuild Kata to turn it on or ignore it - it's not coded up in kata.... And then, would you want kata to silently ignore an selinux option if it was passed in? I'm not sure. I'd not call that the 'path of least surprise'.

Now, if somebody wants to undertake coding up selinux support in kata - that'd be great, and I'm sure there are folks who would help discuss and review :-)

rhatdan commented 4 years ago

I would just warn in Kata that this is not currently supported. There is no mechanism for Podman to know, and forcing users to understand the difference is difficult, and perhaps impossible. Currently if you tell --security-opt label:disabled, does kata work?

c3d commented 4 years ago

@rhatdan

Currently if you tell --security-opt label:disabled, does kata work?

Yes, it does. See also related Bugzilla

rhatdan commented 4 years ago

Ok, then I guess podman does not send down a label in that case.

The difficult thing is this is hard for users to understand. IE Some containers run fine with SELinux enabled, but kata fails.

One question I would have is what is the label of the procesess running the VM.

If it is running qemu? What is the label ps -eZ | grep qemu

We really should get this labeled correctly, so we could take advantage of SELinux separation on VMs.

fidencio commented 4 years ago

@rhatdan,

Providing you the info asked in December:

# ps -eZ | grep qemu
unconfined_u:system_r:container_runtime_t:s0 5892 ? 00:00:03 qemu-kvm
rhatdan commented 4 years ago

So this means that podman executed qemu-kvm directly. What AVC's are you seeing when you run this? We could transition qemu-kvm to a better domain. like svirt_t.

fidencio commented 4 years ago

@rhatdan, I don't see any particular AVC, but basically the same log as pointed by @c3d in this comment https://github.com/kata-containers/runtime/issues/784#issuecomment-559726072

rhatdan commented 4 years ago

Can we get Kata to just warn rather then throw an error? I don't want to put something into Podman to identify which container runtimes support which features. I would figure there are other parts of the OCI that kata ignores, since it does not currently implement this.

rhatdan commented 4 years ago

If we really want to examine what I believe kata should be doing with SELinux is to launch the qemu (or what ever process launches the VM, with an SELinux label. In the best senario it could launch the process as svirt_t:MCS, which it could figure out by calling virtual_domain_context(), and then launching with the MCS label in the OCI Spec. This might not work, though, since the image label might not be correct.

amshinde commented 4 years ago

@rhatdan I have opened a PR (https://github.com/kata-containers/runtime/pull/2443) to disable selinux while support for it is added. This should unblock running Kata on systems with selinux enforced.

grahamwhaley commented 4 years ago

just for ref @rhatdan , afaik kata is not missing many OCI features, and afaik those it does not support, it fails on. Limitations documented here. I'm not a fan of fairly-silently ignoring a security feature request. I'm more a fan of the 'path of least surprise'.

rhatdan commented 4 years ago

Sure, we have to work on getting SELinux implemented. But we are going to need a different strategy to get this going, mainly because the container_t label will only work for Namespace based containers, and will not work for KVM based containers.

I hope to have some more time to try to play with Kata and SELinux.