Open pmoravec opened 1 year ago
keep with current status that we can load the overlay kmod - but document it (where?).
I'd say document twice: One in the sos.log, one in the standard output. Not sure if a prompt should be needed as well, i.e. 'running sos report may load overlay kmod, do you want to continue?' but at least we should make sure that users know that it will happen.
I feel like in this case we should run it but:
Alternative options I see:
I'm personally really curious as to why it does not reload the kmod on subsequent calls. Perhaps we could get to the bottom of that with the podman devs/SMEs to answer our question here.
In practice, any system that has podman or docker installed that is running sos would almost certainly have something going on that causes (or already caused) podman to load the overlay kmod, so this would be an extreme edge case I think.
That being said, I'm in favor of bending the rule here given the whole context of where this crops up, we can have our container inspection methods check for overlay
and if it's missing print a warning before we actually perform any call outs to the relevant commands.
Reproducer:
overlay
kernel module not loaded (just unloading the module isnt sufficient, that is interesting - and suggests a podman bug)podman
ordocker
installed, but no plugin is running (running plugin (always?) requires the kmod to be loadedThen running almost any
sos report
command (e.g. evensos report -o qpid --batch --build
that doesnt touch containers at all) does load theoverlay
kmod.The reason is detecting information about runtime containers like https://github.com/sosreport/sos/blob/main/sos/policies/runtimes/__init__.py#L85-L87 or https://github.com/sosreport/sos/blob/main/sos/policies/runtimes/__init__.py#L85-L87 call [podman|docker] commands that do require that kmod.
Technically, following the "sos can not alter the system, at all" rule, we should not load the kmod. And when we detect the kmod is missing, we should set
ContainerRuntime.active = False
(elegant way of disabling those commands).On a philosophical point of view, that can cause more confusion than gain: a user inspecting sosreport would see no podman images despite there were present - but not detected "just because" some kernel module not loaded (which is a hidden information for the user). Should we stick to the golden rule "not alter the system, at all"? Or should we prevent such user confusion? Moreover are we OK to also run basically whole
podman
plugin conditionally "only when overlay kmod is loaded"? That could be really confusing for a user (though in the rare situation when podman is in use but the kmod is unloaded)If we dont collect some (limited set of) commands due to "dont alter the system" rule, the user potential confusion is limited and justified. But here..?
I would say let:
improve reporting (on UI and in manifest/sos.log) situations when
sos
skipped collecting some stuff due to a predicate. The warnings like[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection.
dont cover all scenarios. Better reporting can prevent user confusion (description of the situations with no warn is missing, though). This applies independently on this current issue, which just reminded me importance of the topic.
overlay
kmod - but document it (where?). Since the kmod is not needed every time:So, if user manually removes
overlay
kmod, then the kmod is not further required bypodman
/ netiher by sosreport. (this is imho the most "ultimate" argument for the status quo answer to the philosophical question)podman
behaves such confusingly..?