Open pmoravec opened 1 year ago
Optionally, we might have an env.variable (with current default) to customize the sos_timeout
per an avocado run..? (but that does not answer my "too lengthy plugins" point).
That's interesting, most of my testing happens on a container on my laptop, not seen any timeout issues like this so far. Albeit it's an LXD container and not podman/buildah/docker
The "container blame" is just a theory as I dont exactly know the full environment where we noticed such timeouts. The lengthy plugins usually run much faster (esp. cgroups
) and their execution time "scale up" with number of containers on the system, afaik.
How are these potentially problematic containers launched, exactly? Containers are in most respects the same as running on bare metal, so this kind of performance drop is surprising.
That being said, cgroups
taking longer makes sense if there are dozens or even hundreds of containers running, as each container will create a lot of new collections in the cgroups
plugin - same for openshift
, crio
, etc... if the container logs are requested. But the ones like system
, selinux
, and process
are surprising to see.
We are still investigating this, but we can make tests/report_tests/options_tests/options_tests.py:OptionsFromConfigTest
much faster in general by skipping many plugins (or enabling just those we have a particular test case).
https://github.com/sosreport/sos/pull/3288 raised for it.
When running avocado tests in a container(*), this test easily timeouts despite it has 10 minutes timeout (https://github.com/sosreport/sos/blob/main/tests/cleaner_tests/full_report/full_report_run.py#L25).
The main cause is
sos report
takes 8 minutes (while subsequentclean
is supposed to run a few times longer, so even 20 minutes timeout might not be sufficient). We can increase the timeout as a defensive resolution, but .. to what value? Also does it make sense to optimise the run somehow? Since the most lengthy plugins are:(*) I think the fact sos runs in container vastly contributes to the duration of all those plugins (esp.
cgroups
).Does it make sense to call this
sos
with option e.g.--plugin-timeout 60
(or maybe90
)? For the sake of cleaner testing, we are not much interested in files like/sys/fs/cgroup/cpuacct/system.slice/sys-kernel-config.mount/tasks
(collecting this file took over 2 seconds alone).