Open liyi-ibm opened 5 years ago
N: that's actually a known bug, somewhere in the timer subsystem I don't think we have a root cause because it's difficult to reproduce or trace. The watchdog timer gets lost when the CPU goes idle. It seems to be pretty harmless so we can ignore it for now.
V: This did not look to be related to cgroup cfs balancing. I did not know about the new scenario that you mentioned. I will work this as an independent issue.
N: No it's not, it's some bug we haven't been able to track down. It happens without cgroups at all. It doesn't appear to be too harmful though, doesn't seem to cause real lockups.
V: The cfs settings can help in solving the lockup caused by cfs scheduler. You have not yet hit that, but you will if you run long enough.
hard lockup only means a cpu was doing something for very long time (10 seconds) and not run any workload. We are solving each scenario where cpu spend lots of time. So the hard lock message is a way to observe what the cpu is doing (memory allocation work or cpu scheduler, etc) The above message you got is a missed timer. the cpu was really idle and does not point to a problem where it was stuck doing some kernel work. Hence that can be ignored. You should continue your tests with different load and utilization to be sure that we can hit the problem and also solve it with different settings.