Cgroupsv2, systemd, nftables, cgrulesengd, and filtering OUTPUT of terminal apps

morfikov commented 1 month ago

I'm trying to filter OUTPUT packets of all internet apps using cgroupsv2 and nftables. This was working fine with cgroupsv1, but systemd wants to remove the support for v1, so I had to switch to v2 and it looks like that it doesn't work as it should.

Basically everything works well for GUI apps, for instance:

# egrep firefox /etc/cgrules.conf
*:firefox            cpu,io,memory,pids,rdma,misc morfikownia/user/firefox/
*:firefox-bin        cpu,io,memory,pids,rdma,misc morfikownia/user/firefox/

# nft list chain inet filter check-cgroup-user | grep firefox
    socket cgroupv2 level 3 "morfikownia/user/firefox" tcp dport { 80, 443 } counter packets 124 bytes 7440 accept comment "https/http"
    socket cgroupv2 level 3 "morfikownia/user/firefox" udp dport 443 counter packets 42 bytes 53471 accept comment "google quic protocol/http3"
    socket cgroupv2 level 3 "morfikownia/user/firefox" tcp dport { 3000, 4433, 5443, 8080, 9090 } ip daddr 192.168.1.0/24 counter packets 0 bytes 0 accept comment "for firefox non standard https/http"

But there's a problem with terminal tools like ping or ssh -- sometimes they work, and sometimes they don't. Take a look a the following example.

# egrep -ir ssh /etc/cgrules.conf
*:sshfs              cpu,io,memory,pids,rdma,misc morfikownia/user/ssh/
*:ssh                cpu,io,memory,pids,rdma,misc morfikownia/user/ssh/

# nft list chain inet filter check-cgroup-user | grep ssh
    socket cgroupv2 level 3 "morfikownia/user/ssh" meta l4proto tcp counter packets 2 bytes 120 accept

The following logs are from 3 attempts when I try to connect to the remote SSH host.

The firtst try:

cgrulesengd[13882]: EXEC Event: PID = 15280, tGID = 15280
cgrulesengd[13882]: Scanned proc values are 1000 1000 1000 1000
cgrulesengd[13882]: Scanned proc values are 1000 1000 1000 1000
cgrulesengd[13882]: Found matching rule * for PID: 15280, UID: 1000, GID: 1000
cgrulesengd[13882]: Executing rule * for PID 15280... Will move pid 15280 to cgroup 'morfikownia/user/ssh/'
cgrulesengd[13882]: Adding controller cpu
cgrulesengd[13882]: Adding controller io
cgrulesengd[13882]: Adding controller memory
cgrulesengd[13882]: Adding controller pids
cgrulesengd[13882]: Adding controller rdma
cgrulesengd[13882]: Adding controller misc
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: Warning: cgroup_attach_task_pid failed: 50001
cgrulesengd[13882]: Warning: failed to apply the rule. Error was: 50001
cgrulesengd[13882]: Cgroup change for PID: 15280, UID: 1000, GID: 1000, PROCNAME: /usr/bin/ssh FAILED! (Error Code: 50001)

$ ssh root@192.168.1.1
Enter passphrase for key '/home/morfik/.ssh/router_rsa':

It worked:

The second try:

cgrulesengd[13882]: EXEC Event: PID = 15287, tGID = 15287
cgrulesengd[13882]: Scanned proc values are 1000 1000 1000 1000
cgrulesengd[13882]: Scanned proc values are 1000 1000 1000 1000
cgrulesengd[13882]: Found matching rule * for PID: 15287, UID: 1000, GID: 1000
cgrulesengd[13882]: Executing rule * for PID 15287... Will move pid 15287 to cgroup 'morfikownia/user/ssh/'
cgrulesengd[13882]: Adding controller cpu
cgrulesengd[13882]: Adding controller io
cgrulesengd[13882]: Adding controller memory
cgrulesengd[13882]: Adding controller pids
cgrulesengd[13882]: Adding controller rdma
cgrulesengd[13882]: Adding controller misc
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: Warning: cgroup_attach_task_pid failed: 50001
cgrulesengd[13882]: Warning: failed to apply the rule. Error was: 50001
cgrulesengd[13882]: Cgroup change for PID: 15287, UID: 1000, GID: 1000, PROCNAME: /usr/bin/ssh FAILED! (Error Code: 50001)

$ ssh root@192.168.1.1
Enter passphrase for key '/home/morfik/.ssh/router_rsa':

It also worked.

The third try:

cgrulesengd[13882]: EXEC Event: PID = 15293, tGID = 15293
cgrulesengd[13882]: Scanned proc values are 1000 1000 1000 1000
cgrulesengd[13882]: Scanned proc values are 1000 1000 1000 1000
cgrulesengd[13882]: Found matching rule * for PID: 15293, UID: 1000, GID: 1000
cgrulesengd[13882]: Executing rule * for PID 15293... Will move pid 15293 to cgroup 'morfikownia/user/ssh/'
cgrulesengd[13882]: Adding controller cpu
cgrulesengd[13882]: Adding controller io
cgrulesengd[13882]: Adding controller memory
cgrulesengd[13882]: Adding controller pids
cgrulesengd[13882]: Adding controller rdma
cgrulesengd[13882]: Adding controller misc
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgrulesengd[13882]: Warning: cgroup_attach_task_pid failed: 50001
cgrulesengd[13882]: Warning: failed to apply the rule. Error was: 50001
cgrulesengd[13882]: Cgroup change for PID: 15293, UID: 1000, GID: 1000, PROCNAME: /usr/bin/ssh FAILED! (Error Code: 50001)
kernel: * NFTABLES:cgroup-systemd * IN= OUT=bond0 SRC=192.168.1.150 DST=192.168.1.1 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=63762 DF PROTO=TCP SPT=36268 DPT=22 SEQ=2491944508 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A51BE46C1000000000103030E) UID=1000 GID=1000
kernel: * NFTABLES:cgroup-systemd * IN= OUT=bond0 SRC=192.168.1.150 DST=192.168.1.1 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=63763 DF PROTO=TCP SPT=36268 DPT=22 SEQ=2491944508 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020405B40402080A51BE4AC1000000000103030E) UID=1000 GID=1000

$ ssh root@192.168.1.1
^C

But the third attempt didn't work. The packets were dropped in nftables. The question is why? The NFTABLES:cgroup-systemd label indicates, that the packets didn't go where they should in nftables:

chain OUTPUT {
...
    socket cgroupv2 level 1 "morfikownia/" counter jump check-cgroup
    socket cgroupv2 level 1 "system.slice/" counter jump check-cgroup-systemd
    socket cgroupv2 level 1 "user.slice/" counter jump check-cgroup-systemd
...

So they should go to the check-cgroup chain, and in the first two attempts they went, but in the third attempt they went to the check-cgroup-systemd and since there's no rules there for SSH client, they were dropped. Why does this happen? In the GUI apps, everything works well each time.

When I try to connect to remote SSH server, and the connection is successful, I can see that pids were added in the right place:

# egrep -ir 15710 /sys/fs/cgroup
...
/sys/fs/cgroup/morfikownia/user/ssh/cgroup.procs:15710
/sys/fs/cgroup/morfikownia/user/ssh/cgroup.threads:15710

When I try to connect to remote SSH server, and the connection fails, the pids also are added to the right place:

/sys/fs/cgroup/morfikownia/user/ssh/cgroup.procs:15884
/sys/fs/cgroup/morfikownia/user/ssh/cgroup.threads:15884

So I can't figure this out -- why does it work sometimes, and sometimes it doesn't?

The second question is, what do these warnings mean?

cgrulesengd[13882]: Warning: cgroup_attach_task_pid failed: 50001
cgrulesengd[13882]: Warning: failed to apply the rule. Error was: 50001
cgrulesengd[13882]: Cgroup change for PID: 15280, UID: 1000, GID: 1000, PROCNAME: /usr/bin/ssh FAILED! (Error Code: 50001)

It looks like that it happens all the time in my system, no matter whether it works or not:

# export CGROUP_LOGLEVEL=debug
# cgexec ssh 192.168.1.1
Found cgroup option cpuset, count 0
Found cgroup option cpu, count 1
Found cgroup option io, count 2
Found cgroup option memory, count 3
Found cgroup option pids, count 4
Found cgroup option rdma, count 5
Found cgroup option misc, count 6
Found cgroup option cgroup, count 7
Unable to read /var/run/libcgroup/systemd , continuing without systemd default cgroup.
My euid and egid is: 0,0
Not using cached rules for PID 15964.
Parsing configuration file /etc/cgrules.conf.
Added rule * (UID: -2, GID: -2) -> morfikownia/user/ssh/ for controllers: cpu io memory pids rdma misc
Parsing of configuration file complete.

Found matching rule * for PID: 15964, UID: 0, GID: 0
Executing rule * for PID 15964... Will move pid 15964 to cgroup 'morfikownia/user/ssh/'
Adding controller cpu
Adding controller io
Adding controller memory
Adding controller pids
Adding controller rdma
Adding controller misc
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/ssh/cgroup.procs
Warning: cgroup_attach_task_pid failed: 50001
Warning: failed to apply the rule. Error was: 50001
cgroup change of group failed

So what's going on?

kamalesh-babulal commented 3 weeks ago

@morfikov Thanks for reporting the issue, does adding the following help?

*:sshd                cpu,io,memory,pids,rdma,misc morfikownia/user/ssh/

morfikov commented 3 weeks ago

No, why should it? I'm using ssh from my client machine to connect to the remote SSH server. I'm trying to configure cgroups on the client to filter OUTPUT packets. SSHD is a server, and it's not even started on the client.

The ssh was an example, the same thing happens to ping, curl, and other terminal apps.

kamalesh-babulal commented 3 weeks ago

@morfikov Ah ! my bad. I miss reading it.

morfikov commented 3 weeks ago

@kamalesh-babulal Andrei Borzenkov from systemd-devel mailing list suggested that the problem may lay in the way nftables checks things related to cgroups:

Not really. nftables checks the socket cgroup, not the process cgroup. The socket may have been created while process was in the old cgroup.

That would explain the weird behavior.

Can this be fixed in libcgroup or should I ask about this issue some guys from the kernel?

kamalesh-babulal commented 3 weeks ago

@morfikov AFAIK, It's the kernel behavior, where the socket is not migrated along with the task migration to another cgroup. I have an idea for the transient systemd equivalent, which is reusable and is called the delegated scope:

cgcreate -c -gcpu,io,memory,pids,rdma,misc:morfikownia.slice/user.scope
cgexec -gcpu:morfikownia.slice/user.scope <command>

instead of ssh use cgexec + ssh as mentioned above, it will create the sockets in the expected cgroup and the window of race might not be seen, which will make cgrules.conf redundant. Also, change the /etc/cgrules.conf, for existing pids

*:sshfs              cpu,io,memory,pids,rdma,misc morfikownia.slice/user.scope
*:ssh                cpu,io,memory,pids,rdma,misc morfikownia.slice/user.scope/

I do not know, why user.scope is not enabling the other controllers other than CPU I will try debugging it.

morfikov commented 3 weeks ago

Yes, that worked. I just tested with ssh and ping:

chain OUTPUT {
....
    socket cgroupv2 level 1 "morfikownia.slice/" counter jump check-cgroup-morfikownia-user-slice

chain check-cgroup-morfikownia-user-slice {

    socket cgroupv2 level 2 "morfikownia.slice/user.scope/"       meta l4proto tcp               counter accept
    socket cgroupv2 level 2 "morfikownia.slice/user.scope/"       meta l4proto icmp              counter accept
}
...

Also added corresponding entries to the cgrules.conf file and tried to exec ssh and ping via cgexec in a loop a few times:

# nft list chain inet filter check-cgroup-morfikownia-user-slice
table inet filter {
        chain check-cgroup-morfikownia-user-slice {
                socket cgroupv2 level 2 "morfikownia.slice/user.scope" meta l4proto tcp counter packets 100 bytes 6000 accept
                socket cgroupv2 level 2 "morfikownia.slice/user.scope" meta l4proto icmp counter packets 100 bytes 8400 accept
        }
}

So now it catches every single time, and there's no drops.

So how to make it work using only the cgrules.conf file?

drakenclimber commented 2 weeks ago

The second question is, what do these warnings mean?

cgrulesengd[13882]: Warning: cgroup_attach_task_pid failed: 50001
cgrulesengd[13882]: Warning: failed to apply the rule. Error was: 50001
cgrulesengd[13882]: Cgroup change for PID: 15280, UID: 1000, GID: 1000, PROCNAME: /usr/bin/ssh FAILED! (Error Code: 50001)

This error is likely coming from here. I'm guessing that your cgroup.subtree_control file is empty, and thus the above error. libcgroup is (perhaps erroneously) expecting you to have enabled at least one controller.

So how to make it work using only the cgrules.conf file?

I'm not sure that you can on a cgroup v2 system. systemd owns the entire cgroup hierarchy, so technically they are "right" in this case. I'd guess that libcgroup and the kernel/systemd are in a race condition for the placement of the process in a cgroup. Sometimes you win, sometimes you lose.

If you want to be certain that your process is running in the correct cgroup, you'll likely want to do something like the following:

Create a delegated cgroup with your parent process as the first PID in the cgroup
Then any processes (ssh, ping, etc.) you launch from the parent pid will inherit the parent pid's cgroup by default and systemd won't try to move them

Having cgrules move a process violates systemd's single writer rule. One a v1 system it was pretty easy to get away with such a solution, but it's much harder to do it safely on a v2 system. (In fact, it may not be possible.)

morfikov commented 2 weeks ago

This error is likely coming from here. I'm guessing that your cgroup.subtree_control file is empty, and thus the above error. libcgroup is (perhaps erroneously) expecting you to have enabled at least one controller.

Yes, that was the case.

Now basically it works only when I use cgexec:

 # cgexec ping wp.pl -c 4
Found cgroup option cpuset, count 0
Found cgroup option cpu, count 1
Found cgroup option io, count 2
Found cgroup option memory, count 3
Found cgroup option pids, count 4
Found cgroup option rdma, count 5
Found cgroup option misc, count 6
Found cgroup option cgroup, count 7
Unable to read /var/run/libcgroup/systemd , continuing without systemd default cgroup.
My euid and egid is: 0,0
Not using cached rules for PID 4945.
Parsing configuration file /etc/cgrules.conf.
Added rule * (UID: -2, GID: -2) -> morfikownia/user/iputils/ for controllers: cpu memory pids
Parsing of configuration file complete.

Found matching rule * for PID: 4945, UID: 0, GID: 0
Executing rule * for PID 4945... Will move pid 4945 to cgroup 'morfikownia/user/iputils/'
Adding controller cpu
Adding controller memory
Adding controller pids
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/iputils/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/iputils/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia/user/iputils/cgroup.procs
OK!
PING wp.pl (212.77.98.9) 56(84) bytes of data.
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=1 ttl=51 time=32.8 ms
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=2 ttl=51 time=31.4 ms
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=3 ttl=51 time=26.5 ms
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=4 ttl=51 time=25.0 ms

--- wp.pl ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 24.971/28.910/32.763/3.262 ms

kamalesh-babulal commented 2 weeks ago

@morfikov There is not much help from the Linux Kernel regarding tracking and migrating the sockets, along with the tasks, that opened them. Another bash hack can be setting the alias for the ssh to be alias ssh cgexec..... given that the cgroup is predictable. In the case of a daemon, like ssh server it would work without the cgexec when the server is placed in the right cgroup, all its spawned threads would be in the same cgroup as the sshd but here it one-time command.

morfikov commented 2 weeks ago

I thought about aliases, but I'll probably get rid of systemd.

morfikov commented 2 weeks ago

I can't figure one thing out. I'm trying to set cgroup path as: /sys/fs/cgroup/${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}/${USERAPP}

The following command works:

# cgcreate -S -c -g cpu,io,memory,pids:${CG_SLICE}/${CG_SCOPE}

The next command would be:

# cgcreate -g cpu,io,memory,pids:${CG_USER_DIR}

But this doesn't work. What worked was:

# cgcreate -g cpu,pids:${CG_USER_DIR}

It looks like, only the two controllers can be used, i.e. cpu and pids, why not all four?

The next command would be:

# cgcreate -g cpu,pids:${CG_USER_DIR}/${USERAPP}

But this one doesn't work, and basically no controllers can be specified.

Am I missing something?

kamalesh-babulal commented 1 week ago

@morfikov Kernel enforces the rule of not enabling the controller if you have a task running in that cgroup. One idea is to create the ${CG_USER_DIR} under the .scope and move the idle task created by libcgroup under ${CG_SLICE}/${CG_SCOPE} to the ${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR} and enable the controller in the ${CG_SCOPE}, it can be achieved using the following:

cgcreate -c -g cpu,io,memory,pids:${CG_SLICE}/${CG_SCOPE}
cgcreate -g:${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}
pid=$(cgget -n -v -r cgroup.procs ${CG_SLICE}/${CG_SCOPE})
cgset -r cgroup.procs="$pid" ${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}
cgset -r cgroup.subtree_control="+cpu +cpuset +io +memory +pids" ${CG_SLICE}/${CG_SCOPE}

but with a word of caution, ensure one task is always alive (not necessarily running) under ${CG_SLICE}/${CG_SCOPE} or under any child cgroup of the scope. Otherwise ${CG_SCOPE} will get removed. In this case there will be libcgroup_systemd_idle_thread also running under ${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}

morfikov commented 1 week ago

@kamalesh-babulal Yes, that works fine, but there's one thing -- what when I want to have multiple dirs under ${CG_SLICE}/${CG_SCOPE} ? For instance:

${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}
${CG_SLICE}/${CG_SCOPE}/${CG_SYS_DIR}

In such case, the first path will be working just fine, but in the case of the second path there's no way to add controllers.

kamalesh-babulal commented 1 week ago

@morfikov I am sorry, I do not understand the question. I am assuming you want to enable controllers for all new child cgroups. How about something like below:

# create the slice and scope
cgcreate -c -g cpu,io,memory,pids:${CG_SLICE}/${CG_SCOPE}

# create tmp cgroup
cgcreate -g:${CG_SLICE}/${CG_SCOPE}/_tmp

# move the idle task to tmp cgroup
pid=$(cgget -n -v -r cgroup.procs ${CG_SLICE}/${CG_SCOPE})
cgset -r cgroup.procs="$pid" ${CG_SLICE}/${CG_SCOPE}/_tmp

# Enable the cgroup controllers to the scope
cgset -r cgroup.subtree_control="+cpu +cpuset +io +memory +pids" ${CG_SLICE}/${CG_SCOPE}

# create user dir
cgcreate -g:${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}

# enable controllers on user dir
cgset -r cgroup.subtree_control="+cpu +cpuset +io +memory +pids" ${CG_SLICE}/${CG_SCOPE}/${CG_USER_DIR}

# create sys dir
cgcreate -g:${CG_SLICE}/${CG_SCOPE}/${CG_SYS_DIR}

# enable controllers on sys dir
cgset -r cgroup.subtree_control="+cpu +cpuset +io +memory +pids" ${CG_SLICE}/${CG_SCOPE}/${CG_SYS_DIR}

morfikov commented 1 week ago

Yes, this is it.

kamalesh-babulal commented 1 week ago

@morfikov Can we close this issue?

morfikov commented 1 week ago

I think yes. But there's one thing. Those commands running from a script i a terminal, work well. But when I want to make it work at boot, I get the following error:

cgcreate: can't create cgroup morfikownia.slice/libcgroup.scope: Cgroup operation failed
Error: failed to open the system bus: 2

I'm trying to run my script via the following systemd service:

[Unit]
Description=Control Group configuration service
ConditionDirectoryNotEmpty=/sys/fs/cgroup/
ConditionFileIsExecutable=/opt/skrypty/cgstart
DefaultDependencies=no
Requires=cgrulesengd.service
Before=sysinit.target nftables.service network-pre.target umount.target shutdown.target
After=cgrulesengd.service
Conflicts=umount.target shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/opt/skrypty/cgstart
OOMScoreAdjust=-800

[Install]
WantedBy=sysinit.target

kamalesh-babulal commented 1 week ago

@morfikov I am not an expert on systemd service files. Adding After=dbus.service cgrulesengd.service and removing sysinit.target from Before= helps?

Before=nftables.service network-pre.target umount.target shutdown.target
After=dbus.service cgrulesengd.service

morfikov commented 1 week ago

Yes, that helped:

Requires=cgrulesengd.service dbus.service
After=cgrulesengd.service dbus.service
Before=nftables.service network-pre.target umount.target shutdown.target

morfikov commented 1 week ago

@kamalesh-babulal One more question.

To make it work for regular users (members of some group), I need to allow adding pids to cgroup.procs file. I tried to create cgroup paths using: cgcreate -a root:root -t root:cgroups... but all files are still owned by root:root . So how to make it work?

kamalesh-babulal commented 1 week ago

@morfikov This is a little tricky in comparison to the cgroup v1, where just changing the permission of the tasks file of the cgroup or group ownership is sufficient but cgroup v2 has an enforced rule, that says the user writing pid into destination cgroup should have permission to write on the nearest common ancestor of both source and destination cgroup. It is to avoid moving the tasks from the delegated subtree to the non-delegated subtree. It suggested moving the parent or the first task of the user to the delegated subtree by the root user, so all the tasks forked by the first task will be under the delegated subtree and freely moved between the children cgroups under the delegated subtree.

The https://man7.org/linux/man-pages/man7/cgroups.7.html Cgroup delegation containment rules section outlines the rules for delegation.

morfikov commented 1 week ago

that says the user writing pid into destination cgroup should have permission to write on the nearest common ancestor of both source and destination cgroup.

Does this mean that in the case of systemd, the common ancestor would be the root path, i.e. /sys/fs/cgroup/ ? If so there's no way to make it work for regular users?

morfikov commented 1 week ago

@kamalesh-babulal

I made it work:

# chown root:cgroups /usr/bin/cgexec
# chmod 2750 /usr/bin/cgexec

# chown root:cgroups /sys/fs/cgroup/cgroup.procs
# chown root:cgroups /sys/fs/cgroup/cgroup.threads
# chmod 660 /sys/fs/cgroup/cgroup.procs
# chmod 660 /sys/fs/cgroup/cgroup.threads
# find /sys/fs/cgroup/morfikownia.slice -iname cgroup.procs | while read pid; do chown root:cgroups $pid; chmod 660 $pid; done
# find /sys/fs/cgroup/morfikownia.slice -iname cgroup.threads | while read thread; do chown root:cgroups $thread; chmod 660 $thread; done

Now it works as a regular user:

 $ cgexec ping wp.pl -c 4
Found cgroup option cpuset, count 0
Found cgroup option cpu, count 1
Found cgroup option io, count 2
Found cgroup option memory, count 3
Found cgroup option pids, count 4
Found cgroup option rdma, count 5
Found cgroup option misc, count 6
Found cgroup option cgroup, count 7
Unable to read /var/run/libcgroup/systemd , continuing without systemd default cgroup.
My euid and egid is: 1000,5060
Not using cached rules for PID 11165.
Parsing configuration file /etc/cgrules.conf.
Added rule * (UID: -2, GID: -2) -> morfikownia.slice/libcgroup.scope/apps-user/iputils/ for controllers: cpu cpuset io memory pids
Parsing of configuration file complete.

Found matching rule * for PID: 11165, UID: 1000, GID: 1000
Executing rule * for PID 11165... Will move pid 11165 to cgroup 'morfikownia.slice/libcgroup.scope/apps-user/iputils/'
Adding controller cpu
Adding controller cpuset
Adding controller io
Adding controller memory
Adding controller pids
cgroup build procs path: /sys/fs/cgroup//morfikownia.slice/libcgroup.scope/apps-user/iputils/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia.slice/libcgroup.scope/apps-user/iputils/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia.slice/libcgroup.scope/apps-user/iputils/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia.slice/libcgroup.scope/apps-user/iputils/cgroup.procs
cgroup build procs path: /sys/fs/cgroup//morfikownia.slice/libcgroup.scope/apps-user/iputils/cgroup.procs
OK!
PING wp.pl (212.77.98.9) 56(84) bytes of data.
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=1 ttl=51 time=178 ms
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=2 ttl=51 time=28.1 ms
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=3 ttl=51 time=31.0 ms
64 bytes from www.wp.pl (212.77.98.9): icmp_seq=4 ttl=51 time=50.1 ms

--- wp.pl ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 28.054/71.672/177.514/61.690 ms

kamalesh-babulal commented 1 week ago

@morfikov Super cool workaround! Thanks for sharing it.

libcgroup / libcgroup

Cgroupsv2, systemd, nftables, cgrulesengd, and filtering OUTPUT of terminal apps #432