m-erhardt / check-container-stats

Icinga / Nagios plugins to check metrics of Docker / PodMan containers
GNU General Public License v3.0
8 stars 5 forks source link

Add support for rootless containers #8

Closed leeclemens closed 1 year ago

leeclemens commented 1 year ago

Adds support for providing a username to access rootless containers. Defaults to root.

Fixes #7

leeclemens commented 1 year ago

While I would prefer f-strings (or .format()), I think it is more readable using the literal braces and not escaping them (e.g. {{{{.Name}}}}); so I opted to use the "%s" % var format.

m-erhardt commented 1 year ago

First of all: thanks for your PR! Support for rootless PodMan containers is indeed something check_container_stats_podman.py is still lacking.

I work with rootless Docker containers on a daily basis but unfortunately do not use rootless PodMan containers at the moment. So please allow me some time to set up a test system with rootless PodMan containers before merging your PR.

With regards to your PR: the daemonless nature of PodMan indeed makes it hard to check rootless containers. From an engineering-perspective I like your systemd-run-approach.

Unfortunately this approach has some shortcomings we still need to cover before merging.

I mostly work in the RHEL-ecosystem so I tested your wrapping-command on RHEL 7/8/9 platforms. It seems the command only works on RHEL9 plattforms and returns errors on on RHEL7 & 8 (which would make the plugin unusable on these platforms).

# Rocky Linux release 9.1
[root@rocky9]# /usr/bin/systemd-run --machine=root@ --quiet --user --collect --pipe --wait echo test
test

# Rocky Linux release 8.7
[root@rocky8]# /usr/bin/systemd-run --machine=root@ --quiet --user --collect --pipe --wait echo test
Execution in user context is not supported on non-local systems.

# CentOS 7.9
[root@centos7]# /usr/bin/systemd-run --machine=root@ --quiet --user --collect --pipe --wait echo test
/usr/bin/systemd-run: unrecognized option '--pipe'

Regarding pylint's consider-using-f-string: I've personally observed f-strings performing better than "%s" % var-replacements so I mostly use them.

What's the reasoning behind the additional shlex.split()? As our command is rather static I'd prefer manually splitting the command instead of the (arguably minimal) performance impact of the additional import and split()-operation.

m-erhardt commented 1 year ago

I've had some more thoughts on your PR. The way you currently implemented the feature it will introduce a breaking change and decrease compatibility across OS platforms.

How about making the systemd-run-wrapping optional? This would not break functionality when checking root-containers on older OS-platforms but still allow to check rootless PodMan containers.

Probably something like this:

def get_args():
    # [...]
    parser.add_argument("--rootless", required=False, action='store_true',
                        help="Check rootless container (via systemd-run)",
                        type=bool, default=False, dest='rootless')
    parser.add_argument("-u", "--user", required=False,
                        help="Username of container (only rootless)",
                        type=str, dest='user')
    # [...]

def get_container_stats(args, docker_env):

    # [...]
    # systemd-run wrap-command for checking rootless podman container
    rl_wrap_cmd: list = ['/usr/bin/systemd-run', f'--machine={ args.user }@', '--quiet', '--user',
                         '--collect', '--pipe', '--wait']

    pm_stat_cmd: list = ['podman', 'stats', args.container_name, '--no-stream', '--format',
                         '"{{.Name}},{{.ID}},{{.CPUPerc}},{{.MemUsage}},{{.NetIO}},{{.BlockIO}},{{.PIDs}}"']

    if args.rootless is True:
        # Prepend "podman stats" with "systemd-run" wrapper
        pm_stat_cmd = rl_wrap_cmd + pm_stat_cmd

    result = subprocess.run(pm_stat_cmd,
                            shell=False,
                            check=False,
                            env=docker_env,
                            stdout=subprocess.PIPE,
                            stderr=subprocess.PIPE)
    # [...]

def get_container_pslist(args, docker_env):
    # same principle as get_container_stats()
josy1024 commented 1 year ago

have an error message for nrpe + rootless container montitoring, ideas?

details: https://github.com/m-erhardt/check-container-stats/issues/7#issuecomment-1641021876

diags on rocky linux 9.1 UNKNOWN - docker stats command returned error: b'Failed to connect to bus: Permission denied Failed to start transient service unit: Transport endpoint is not connected

(/usr/bin/systemd-run --machine=mysql-user@ --quiet --user --collect --pipe --wait whoami) mysql-user

/usr/bin/systemd-run --machine=nrpe@ --quiet --user --collect --pipe --wait whoami nrpe

leeclemens commented 1 year ago

Apologies @m-erhardt I think a notification was lost and this drifted off of my radar. I agree making it optional is probably the best approach, although I was trying to avoid that and find a one-size-fits-all solution.

Regarding shlex, I thought I replied...it may have been lost as a draft. In this case, it may not matter a lot. I guess the username is user-entered data, so I defaulted to using it. I tend to prefer it and see the import executing every 5 minutes as being negligible. I completely understand a more purist approach, just my opinion.

leeclemens commented 1 year ago

I'll need to do some more testing, but it looks like using cgroupsv2 and su can achieve the same (tested with both ~RHEL 8 and ~RHEL 9). I'll open a new PR against the same Issue once I get it working.

leeclemens commented 1 year ago

I know you weren't thrilled about importing shlex, but I limited it to shlex.strip and got it working.

https://github.com/m-erhardt/check-container-stats/pull/9