Closed Lickkylee closed 9 months ago
We had the same incidence. The logs in /var/log/azure/Microsoft.Azure.Diagnostics.LinuxDiagnostic/extension.log
where full with repeating:
2019/03/31 03:07:23 [Microsoft.Azure.Diagnostics.LinuxDiagnostic-3.0.119] Error in MDSD:teInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.5832980Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.6914160Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.6916010Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.7089990Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.7090760Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.5924470Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7084390Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7086500Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7174610Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7176520Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.5961030Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7192690Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7195010Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7278580Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7279850Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23
2019/03/31 03:07:23 [Microsoft.Azure.Diagnostics.LinuxDiagnostic-3.0.119] Daemon,success,1,message in mdsd.err:2019-03-31 03:07:06:teInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.5832980Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.6914160Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.6916010Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.7089990Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:04:06.7090760Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.5924470Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7084390Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7086500Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7174610Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:05:36.7176520Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.5961030Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7192690Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7195010Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7278580Z: Error: OMI EnumerateInstances failed
2019/03/31 03:07:23 2019-03-31T03:07:06.7279850Z: Error: OMI EnumerateInstances failed
Restarting the server fixed the errors and high cpu usage.
high cpu issue has been fixed in https://github.com/microsoft/pal/pull/117 and https://github.com/microsoft/pal/commit/6c0c108570ed3bb3850916677185f3f4134ca285.
OS Version: CentOS 7.3 (3.10.0-514.26.2.el7.x86_64) OMI: OMI-1.0.8-6 scx: scx-1.6.2-337
we enabled diagnositic extension at 08/02 and noticed that omiagent in our Azure VM would eat up almost 100% CPU of one core since 10/23. This happened suddenly without any changing from our end. The issue lasted for a long time and still bothered us. Some times, it could be solved after we restarted the waagent service which would restarted the omi service as well.
we consulted omiagent engineer. they suggested to open an issue to the providers team since most high cpu would be caused by providers themselves. The diagnostic extension only called SCX providers so we ask for help here.
our troubleshooting:
anyone has clue what's happending or how to do troubleshooting?