sysflow-telemetry / sysflow

SysFlow documentation and issues tracker
Other
45 stars 10 forks source link

Kernel module "nouveau" is blacklisted with SysFlow #46

Closed weii666 closed 1 year ago

weii666 commented 4 years ago

Indicate project collector

Describe the bug sysflow's installer ends up probing in kernel modules that are blacklisted at the host level. Which is a major problem in this scenario, as on node reboot sysflow often wins the race, and the blacklisted kernel module comes in, preventing the desired one from being used.

in this case the “bad” module is nouveau, basically start a node w/ a nvidia gpu, and blacklist the nouveau module at the host. w/o sysflow it won’t be probed in, allowing for the other mod (nvidia) eventually to take over device control. with sysflow agents running, on boot (obviously not immediately, but once the container starts) the module is added around the same time as the container is starting.

To reproduce Exactly reproduce steps are not provided.

Expected behavior There shall be no blacklist kernel module.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

Files Attach any additional log files, config files, scripts or filters, sample sysflow, etc. that will help diagnose the problem

ygelfand commented 4 years ago

To add details:

Environment: Openshift 4.5 OS: RHCoreOS

Reproduce:
ensure "nouveau" is blacklisted (/etc/modprod.d) boot a node w/ a nvidia gpu verify "nouveau" isn't loaded start sysflow agents "nouveau" is loaded

Expected behavior: I believe agents are loading any hardware matching modules, not just "nouveau". sysflow shouldn't load any modules other then its own. Though its only visible for blacklisted modules (as assumption is, others are already loaded), it can lead to unexpected behavior.

araujof commented 1 year ago

Closing due to inactivity. Please reopen if necessary.