sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
799 stars 375 forks source link

in-kernel packet forwarding within an lxc container #1507

Open jpyle490 opened 2 years ago

jpyle490 commented 2 years ago

I recently upgraded from 9.4.1.2 to 10.4.1.6 and it broke my ability to use in-kernel packet forwarding within an lxc container. I also upgraded the host and container from Debian 10 to 11.

I compiled Debian packages. I have ngcp-rtpengine-iptables and ngcp-rtpengine-kernel-dkms installed on the host, and ngcp-rtpengine-daemon and ngcp-rtpengine-utils installed in the container. The xt_RTPENGINE kernel module is loaded on the host and it is visible in the container with lsmod. The iptables configuration in the container defines the rtpengine chain in the filter table, and points traffic to it with -A INPUT -p udp -j rtpengine. I believe this is all standard stuff.

This was enough to get functional in-packet kernel forwarding on 9.4.1.2 on Debian 10. Now, it is not. Current journalctl logs on Debian 11 with 10.4.1.6:

Jun 21 21:57:58 hostname systemd[1]: Starting NGCP RTP/media Proxy Daemon... Jun 21 21:57:58 hostname ngcp-rtpengine-iptables-setup[1614]: modprobe: FATAL: Module xt_RTPENGINE not found in directory /lib/modules/5.10.0-13-amd64 Jun 21 21:57:58 hostname ngcp-rtpengine-iptables-setup[1620]: iptables v1.8.7 (nf_tables): unknown option "--id" Jun 21 21:57:58 hostname ngcp-rtpengine-iptables-setup[1620]: Try iptables -h' or 'iptables --help' for more information. Jun 21 21:57:58 hostname ngcp-rtpengine-iptables-setup[1626]: ip6tables v1.8.7 (nf_tables): unknown option "--id" Jun 21 21:57:58 hostname ngcp-rtpengine-iptables-setup[1626]: Tryip6tables -h' or 'ip6tables --help' for more information. Jun 21 21:57:59 hostname rtpengine[1627]: ERR: [core] FAILED TO DELETE KERNEL TABLE 0 (Permission denied), KERNEL FORWARDING DISABLED Jun 21 21:57:59 hostname rtpengine[1627]: CRIT: [core] Userspace fallback disallowed - exiting Jun 21 21:57:59 hostname systemd[1]: ngcp-rtpengine-daemon.service: Main process exited, code=exited, status=255/EXCEPTION Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1639]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1639]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1643]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1643]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1647]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1647]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1651]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:57:59 hostname ngcp-rtpengine-iptables-setup[1651]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1655]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1655]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1659]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1659]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1663]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1663]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1667]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1667]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1671]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1671]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1675]: rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'xt_RTPENGINE': Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1675]: rmmod: ERROR: could not remove module xt_RTPENGINE: Operation not permitted Jun 21 21:58:00 hostname ngcp-rtpengine-iptables-setup[1628]: Failed to unload the kernel module xt_RTPENGINE. Jun 21 21:58:00 hostname systemd[1]: ngcp-rtpengine-daemon.service: Failed with result 'exit-code'. Jun 21 21:58:00 hostname systemd[1]: Failed to start NGCP RTP/media Proxy Daemon. Jun 21 21:58:00 hostname systemd[1]: ngcp-rtpengine-daemon.service: Consumed 1.090s CPU time.

Some of these errors are legitimate. When I try the iptables command with the --id option in the container, it fails. For comparison, when I try it on the host, it succeeds, even though it's useless to me there. I don't know what to make of that.

The "Module xt_RTPENGINE not found in directory /lib/modules/5.10.0-13-amd64" error is also legitimate since the kernel module file does not exist in the container, but it's useless here, no? And, it is already loaded on the host.

I have security.privileged = true in the container's configuration, although it's possible this is the default. My understanding here has room to grow.

I thought I remembered an option in previous versions not to try to manage the module and the iptables configuration, but just assume everything had been set up properly and go with it. I think this was helpful for a container-based implementation.

I've allowed userspace fallback since capturing these logs and that's okay for now. Even so I would like to work past this and restore in-kernel forwarding if possible.

I'm grateful for any guidance you might have.

Regards, Jeff

rfuchs commented 2 years ago

The --id option for the iptables setup is provided by the iptables plugin, which is not the same as the kernel module. The iptables plugin is provided by the package rtpengine-iptables and this must be installed on the system doing the iptables setup (which again is not the same as loading the kernel module).

I'm not sure about the peculiarities of the container setup (and the stock packaging doesn't support it) so it's possible you may have to modify the included rtpengine-iptables-setup script. That script is executed as a ExecStartPre and ExecStopPost by the systemd service for the daemon. You can use a systemd override to disable it completely if you want and do the iptables setup yourself instead.

The other thing that you may bump into is that rtpengine now runs as non-root user by default, which requires the kernel module to be loaded with the correct options to allow that user access. You can see the options in rtpengine-iptables-setup. You can also use a systemd override to run rtpengine as root instead to make things easier. Use sudo systemctl edit rtpengine-daemon to create an override. See the systemd docs for the appropriate options.

jpyle490 commented 2 years ago

Excellent info. I hadn't seen what was happening in the rtpengine-daemon service file.

I notice in the ngcp-rtpengine-iptables-script it makes mention of another script named ngcp-virt-identify:

if [ -x "$(which ngcp-virt-identify 2>/dev/null)" ]; then
  if ngcp-virt-identify --type container; then
    VIRT="yes"
  fi
fi

When VIRT="yes" then the firewall_setup() and firewall_teardown() sections are bypassed. This seems like a good idea in my case. I don't have ngcp-virt-verify on my system, and I don't see it in the source. Only references to it.

I pseudo-solved this in my case by using the following systemd override file for rtpengine-daemon.service:

[Service]
User=root
Group=root
ExecStartPre=
ExecStopPost=

This, coupled with managing the module load manually on the host instance and the iptables config manually in the container, seems to do the trick for my simple rtpengine configuration.

rfuchs commented 2 years ago

We only use ngcp-virt-verify internally (for different purposes) and it's not included in the packaging (and we don't really support these kind of container setups), which is why it's guarded by if -x.

If you can figure out how to properly package this, then please submit a pull request. Your networking setup would also be interesting to know (so people know how to make the host kernel module able to process the packets targeted at the guest)

jpyle490 commented 2 years ago

Understood. Thanks for the details. I'm not confident enough to attempt a pull request for this. I can say it works for me, but beyond that, that's a lot of pressure! I'm happy to describe my networking configuration, though.

In my configuration, OpenSIPS 3.2 and rtpengine 10.x run inside a manually created LXC container. Both the "host" system and this container are Debian 11. Much of the specifics of the network configuration aren't specifically relevant to rtpengine but I'll include them anyway for completeness. Some of this has been reused over years, and as such could use some updates (looking at you ifconfig). Below are snippets of the /etc/network/interfaces file. VLAN and IP specifics have been obfuscated to protect the guilty.

The networking configuration on the host has a bond0 interface configured for LACP to a pair of stacked upstream switches:

#####################################################################
# Bonded, trunked interface to the switch

auto bond0
iface bond0 inet manual
        slaves eth0 eth2
        bond_mode 802.3ad
        bond_miimon 100
        bond_downdelay 200
        bond_updelay 200

On the switch side, the aggregated interface has all the necessary VLANs tagged; the untagged VLAN is irrelevant. On the Linux side, I configure VLAN subinterfaces and place them into bridges.

If the host needs connectivity into this VLAN, I include IP address information:

#####################################################################
# VLAN111 - where the host lives

auto vlan111
iface vlan111 inet manual
        vlan_raw_device bond0
        post-up ifconfig $IFACE up

auto br111
iface br111 inet static
        bridge_ports vlan111
        bridge_stop no
        bridge-maxwait 2
        address 11.22.33.44
        netmask 255.255.255.192
        gateway 11.22.33.44
        pre-up iptables-restore < /etc/network/iptables.rules

iface br111 inet6 static
        address 2678:abcd:ef01:111::44
        netmask 64
        gateway 2678:abcd:ef01:111::1
        pre-up ip6tables-restore < /etc/network/ip6tables.rules

If the host does not need to operate in a given VLAN, but rather only containers, I create the bridge without any layer-3 info:

#####################################################################
# VLAN222 - container networking option

auto vlan222
iface vlan222 inet manual
        vlan_raw_device bond0
        post-up ifconfig $IFACE up

auto br222
iface br222 inet manual
        bridge_ports vlan222
        bridge_stop no
        bridge-maxwait 2

This can be replicated as many times as needed based on the number of VLANs that need accessed.

On the LXC portion, I configure the networking to join one or more of these bridges. From /var/lib/lxc/ContainerName/config:

lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br222
lxc.net.0.hwaddr = 00:FF:AA:55:66:77

One could also join a container to the same bridge where the VLAN operates. That's not how it works in my case. If it did, I'd run all of rtpengine on the host and skip the container-induced creativity for in-kernel forwarding.

Upon starting the container, I have a bridge status similar to this:

# brctl show
bridge name     bridge id               STP enabled     interfaces
br111           8000.001122334455       no              vlan111        # subinterface to switch via bond0
br222           8000.66778899aabb       no              veth2pjTUB     # container 1
                                                        vethQOechq     # container 2...
                                                        vlan222        # subinterface to switch via bond0

LXC, like most Linux containerization options I suppose, makes use of the various types of kernel cgroups to provide separate-ish environments to run things. For most intents and purposes these containers look and feel like separate systems. But, because their processes run on the host's kernel, certain applications that require kernel interaction get trickier.

I compile rtpengine into Debian packages. I've managed to get in-kernel forwarding to work in a container by installing ngcp-rtpengine-kernel-dkms and ngcp-rtpengine-iptables on the host, and ngcp-rtpengine-daemon and ngcp-rtpengine-utils in the container. Although after your explanation above I'm not sure I need the ngcp-rtpengine-iptables in either place.

The host doesn't need any specific iptables configuration, only the module. I make sure xt_RTPENGINE is included in /etc/modules-load.d/modules.conf on the host.

The container has its own network namespace courtesy of LXC, so it's the container that needs the iptables config. I have the following included in /etc/iptables/rules.v4:

*filter
:rtpengine - [0:0]
-A INPUT -p udp -j rtpengine

That could probably use some optimization but it seems to do just fine.

There are restrictions on what a container can do with respect to the kernel and its modules, and I don't claim to completely understand them. To make rtpengine work like this, I've configured a systemd service override as you recommended with systemctl edit ngcp-rtpengine-daemon to allow it to run as root and to bypass the iptables setup scripts:

[Service]
User=root
Group=root
ExecStartPre=
ExecStopPost=

In my case this appears enough to satisfy the container restrictions and make functional in-kernel forwarding available.

rfuchs commented 2 years ago

Alright, thanks for the explanation. One thing I'm kinda missing in the setup is the iptables rule that hands over the received packets to the kernel module. There should be a -j RTPENGINE (upper case) rule somewhere, either on the host or in the container. This is normally done by the rtpengine-iptables-setup script and is what you need the -iptables package for. Without that rule the kernel module would remain idle. I assume you've confirmed that the kernel module is actually doing its job? 😃 (Hint: inspect /proc/rtpengine/0/list and watch the packet counters)

jpyle490 commented 2 years ago

I've been playing with this quite a bit. You were right, the traffic wasn't actually making it into the kernel module.

I've discovered some of my problems were solved by applying the correct proc_uid and proc_gid parameters when loading the xt_RTPENGINE module (since I have MANAGE_IPTABLES=no in the defaults file). Without those, rtpengine won't start, instead complaining about permission denied errors when trying to manipulate the configured table.

I had experienced the same errors within the LXC container, but I assumed at the time it was because of the container itself. I'm curious if it was only the proc_uid when manually loading the module. I may do some homework on UID/GID mapping through cgroups for LXC containers to see if I can get something to match up.

For now, I've migrated the rtpengine configuration from the container to the main/host instance. That solves the immediate functionality issues.