sos report can load overlay kmod

pmoravec commented 1 year ago

Reproducer:

a freshly rebooted system with overlay kernel module not loaded (just unloading the module isnt sufficient, that is interesting - and suggests a podman bug)
having podman or docker installed, but no plugin is running (running plugin (always?) requires the kmod to be loaded

Then running almost any sos report command (e.g. even sos report -o qpid --batch --build that doesnt touch containers at all) does load the overlay kmod.

The reason is detecting information about runtime containers like https://github.com/sosreport/sos/blob/main/sos/policies/runtimes/__init__.py#L85-L87 or https://github.com/sosreport/sos/blob/main/sos/policies/runtimes/__init__.py#L85-L87 call [podman|docker] commands that do require that kmod.

Technically, following the "sos can not alter the system, at all" rule, we should not load the kmod. And when we detect the kmod is missing, we should set ContainerRuntime.active = False (elegant way of disabling those commands).

On a philosophical point of view, that can cause more confusion than gain: a user inspecting sosreport would see no podman images despite there were present - but not detected "just because" some kernel module not loaded (which is a hidden information for the user). Should we stick to the golden rule "not alter the system, at all"? Or should we prevent such user confusion? Moreover are we OK to also run basically whole podman plugin conditionally "only when overlay kmod is loaded"? That could be really confusing for a user (though in the rare situation when podman is in use but the kmod is unloaded)

If we dont collect some (limited set of) commands due to "dont alter the system" rule, the user potential confusion is limited and justified. But here..?

I would say let:

improve reporting (on UI and in manifest/sos.log) situations when sos skipped collecting some stuff due to a predicate. The warnings like

[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection.

dont cover all scenarios. Better reporting can prevent user confusion (description of the situations with no warn is missing, though). This applies independently on this current issue, which just reminded me importance of the topic.

keep with current status that we can load the overlay kmod - but document it (where?). Since the kmod is not needed every time:

(freshly rebooted RHEL9)
# lsmod | grep overlay
# podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
# lsmod | grep overlay
overlay               155648  0
# rmmod overlay
# lsmod | grep overlay
# podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
# lsmod | grep overlay
#

So, if user manually removes overlay kmod, then the kmod is not further required by podman / netiher by sosreport. (this is imho the most "ultimate" argument for the status quo answer to the philosophical question)

maybe raise a question to container SMEs why podman behaves such confusingly..?

jcastill commented 1 year ago

keep with current status that we can load the overlay kmod - but document it (where?).

I'd say document twice: One in the sos.log, one in the standard output. Not sure if a prompt should be needed as well, i.e. 'running sos report may load overlay kmod, do you want to continue?' but at least we should make sure that users know that it will happen.

mhradile commented 1 year ago

I feel like in this case we should run it but:

Make somehow all users know that this is the exception to the no modification rule. Extra option to disable.
Make sure podman avoids this behavior like adding --no-mod option for us. (does not solve current situation)

Alternative options I see:

Disable podman: Make sure users know that podman functionality is modifying and must be explicitly enabled.
Reimplement wheel: Do not use podman for the detection purposes.
Confusing: Hide every podman call under predicate and let users know.

TurboTurtle commented 1 year ago

I'm personally really curious as to why it does not reload the kmod on subsequent calls. Perhaps we could get to the bottom of that with the podman devs/SMEs to answer our question here.

In practice, any system that has podman or docker installed that is running sos would almost certainly have something going on that causes (or already caused) podman to load the overlay kmod, so this would be an extreme edge case I think.

That being said, I'm in favor of bending the rule here given the whole context of where this crops up, we can have our container inspection methods check for overlay and if it's missing print a warning before we actually perform any call outs to the relevant commands.

sosreport / sos

sos report can load overlay kmod #3334