Closed Smartich0ke closed 2 weeks ago
The problem started when I was running 1.6.3
Is it 1.6.2? Did you see the issues in v1.6.1 or v1.6.0?
There are some replica rebuilding events in the support bundle. Can you help check whether the high CPU usages of instance-manager pods remain if there is no replica rebuild?
Yes sorry I meant 1.6.2. I started having the problem in 1.6.2 and then updated to 1.7.0 to see if it would go away. I rebooted the node once again and waited until everything settled down. Eventually the high CPU usage stopped and is back to normal.
@Smartich0ke Thanks for the update. When the issue happens again, could you keep the environment, and we'd like to collect more information from your cluster? Thank you.
@Smartich0ke When you reboot the node, Longhorn would need to rebuild the replicas on the newly rebooted node so CPU on that node will temporary be high. After all 28 replicas are rebuilt, CPU on that node should go down. If it remains high, it is problematic. Please ping us if it remains high
Ok thanks for the help guys. I will report if it happens again. Closing for now.
Describe the bug
I have 3 nodes in my k3s cluster and one of them has very high CPU usage because of longhorn. I can hear the fans revving up like a jet engine on the node. Longhorn has been fine recently, so this is kind of out of the blue.
Nothing seems unusual in the logs, and I've only been able to narrow it down to the instance-manager one that one node. Longhorn also is also using quite a lot of RAM, around 2-3GB on each node, but this has always been the case so I suspect that is a different issue. I have tried fully rebooting the node and the issue still remains.
The problem started when I was running
1.6.31.6.2. So I updated to 1.7.0 to see if that would fix the issue but it didn't. The instance manager uses a fair bit less than it did before, but still an abnormally high amount.I have tried fully rebooting the node several times, and it has not solved the problem.
I ran `top inside the instance-manager pod and here is a picture of the output:
And here are some grafana screenshots showing the high CPU usage
To Reproduce
Nothing to reproduce. I just let longhorn start normally and it happens.
Expected behavior
That instance-manager uses a more reasonable amount of CPU, like other nodes.
Support bundle for troubleshooting
supportbundle_1abee738-58cb-4a06-8e8b-3ceed44e6282_2024-08-24T22-18-15Z.zip
Environment