microsoft / OMS-Agent-for-Linux

http://www.microsoft.com/oms
Other
410 stars 311 forks source link

omiagent process leaves stale python processes #28

Closed ChrisHeitkamp closed 8 years ago

ChrisHeitkamp commented 8 years ago

Symptom: Every 5 minutes two new python processes are added to the process list which are not terminated. Same parend PID.

omsagent 22076 21695 0 14:32 ? 00:00:00 [python] .... omsagent 31358 21695 0 16:02 ? 00:00:00 [python] omsagent 31360 21695 0 16:02 ? 00:00:00 [python] omsagent 31862 21695 0 16:08 ? 00:00:00 [python] omsagent 31864 21695 0 16:08 ? 00:00:00 [python] omsagent 32401 21695 0 16:13 ? 00:00:00 [python] omsagent 32403 21695 0 16:13 ? 00:00:00 [python] ... [root@srv21 yum.repos.d]# ps -ef |grep omsagent | wc -l 69 [root@srv21 yum.repos.d]#

Same parent PID: omsagent 21695 21663 0 14:28 ? 00:00:05 /opt/omi/bin/omiagent 11 14 --destdir / --providerdir /opt/omi/lib --idletimeout 90 --loglevel WARNING

I think issue occured after omsagent multi-homing with SCOM 2012 R2 was enabled. Restarting the omsagent does not remediate this. The omiserver was not yet restarted.

Error in /var/opt/omi/log/omiserver.log: 2016/02/22 17:33:35 [21663,21663] WARNING: null(0): EventId=30131 Priority=WARNING wsman: authentication failed for user [opsuser] in the same intervals.

Even if this is related I would assume it is not desirable to fill up the process list.

sjohner commented 8 years ago

Same issue here. Using Ubuntu 15.10, no SCOM management groups connected.

omsagent 3200 0.0 0.1 355952 10560 ? Sl 22:44 0:01 /opt/omi/bin/omiagent 10 13 --destdir / --providerdir /opt/omi/lib --idletimeout 90 --loglevel WARNING omsagent 3428 0.0 0.0 0 0 ? Z 22:48 0:00 [python] omsagent 3447 0.0 0.0 0 0 ? Z 22:48 0:00 [python] omsagent 3812 0.0 0.0 0 0 ? Z 22:53 0:00 [python] omsagent 3815 0.0 0.0 0 0 ? Z 22:53 0:00 [python] omsagent 4141 0.0 0.0 0 0 ? Z 22:58 0:00 [python] omsagent 4143 0.0 0.0 0 0 ? Z 22:58 0:00 [python] omsagent 4443 0.0 0.0 0 0 ? Z 23:03 0:00 [python] omsagent 4445 0.0 0.0 0 0 ? Z 23:03 0:00 [python] omsagent 4873 0.0 0.0 0 0 ? Z 23:08 0:00 [python] omsagent 4875 0.0 0.0 0 0 ? Z 23:08 0:00 [python] omsagent 5231 0.0 0.0 0 0 ? Z 23:14 0:00 [python] omsagent 5233 0.0 0.0 0 0 ? Z 23:14 0:00 [python] omsagent 5524 0.0 0.0 0 0 ? Z 23:19 0:00 [python] omsagent 5526 0.0 0.0 0 0 ? Z 23:19 0:00 [python] omsagent 5813 0.0 0.0 0 0 ? Z 23:24 0:00 [python] omsagent 5815 0.0 0.0 0 0 ? Z 23:24 0:00 _ [python] omsagent 5838 0.6 0.4 1206560 40192 ? Sl 23:24 0:01 /opt/microsoft/omsagent/ruby/bin/ruby /opt/microsoft/omsagent/bin/omsagent -d /var/opt/microsoft/omsagent/run/omsagent.pid -o /var/opt/microsoft/omsagent/log/omsagent.log --no-supervisor

Re-installed OMS agent without success. 5min after installation, first zombie processes show up. Is this related to the killing of omsagent every 5min?

Feb 22 22:48:50 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL Feb 22 22:53:52 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL Feb 22 22:58:55 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL Feb 22 23:03:57 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL Feb 22 23:08:59 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL Feb 22 23:14:02 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL Feb 22 23:19:04 donald systemd[1]: omsagent.service: Main process exited, code=killed, status=9/KILL

Seems as if the zombie processes result from client.sh script

omsagent 4443 0.0 0.0 0 0 ? Z 23:03 0:00 [python] omsagent 4445 0.0 0.0 0 0 ? Z 23:03 0:00 [python] omsagent 4873 0.0 0.0 0 0 ? Z 23:08 0:00 [python] omsagent 4875 0.0 0.0 0 0 ? Z 23:08 0:00 [python] omsagent 5231 0.0 0.0 0 0 ? Z 23:14 0:00 [python] omsagent 5233 0.0 0.0 0 0 ? Z 23:14 0:00 [python] omsagent 5524 0.0 0.0 0 0 ? Z 23:19 0:00 [python] omsagent 5526 0.0 0.0 0 0 ? Z 23:19 0:00 [python] omsagent 5813 0.2 0.1 51392 13008 ? S 23:24 0:00 python /opt/microsoft/omsconfig/Scripts/client.py 11 omsagent 5815 0.2 0.1 51392 12972 ? S 23:24 0:00 python /opt/microsoft/omsconfig/Scripts/client.py 11 omsagent 5838 0.7 0.4 1190020 32396 ? Sl 23:24 0:00 /opt/microsoft/omsagent/ruby/bin/ruby /opt/microsoft/omsagent/bin/omsagent -d /var/opt/microsoft/omsagent/run/omsagent.pid -o /var/opt/microsoft/omsagent/log/omsagent.log --no-supervisor

sjohner commented 8 years ago

any update on this?

jeffaco commented 8 years ago

We're testing a new kit to resolve this problem. We'll loop back when we know we have the problem fixed and have something to post.

Thanks for your patience.

agup006 commented 8 years ago

Hi @sjohner,

we have verified the resolution of this issue with a private build. This fix is included in the next release, and we are happy to distribute a private to you if needed.

jeffaco commented 8 years ago

Since this issue is resolved, I'm going to go ahead and close it.

@sjohner: If you need a private, let Anurag know.

sjohner commented 8 years ago

Great, thanks you guys!

@agup006 can you make any comment about the timeline of the next release?

c-arshaikh commented 6 years ago

Problem not resolved yet. I have installed python 2.7.15 (most updated) and OMSAgent-1.4.1. Still zombie process against omsagent stick with in memory (RHEL7.4)

silverl commented 6 years ago

I saw the same today. I put a /etc/cron.daily in place to restart it nightly and kill all processes. Stopping omsagent service does not clean up all processes. My cron script looks like this:

#!/bin/bash
systemctl stop 'omsagent*'
pkill -u omsagent
systemctl start 'omsagent*'
c-arshaikh commented 6 years ago

I have ansible in environment, normally I executes ad-hoc command 'kill $(ps -eo stat,ppid|grep -w Z|awk '{print $2}'|tr "\n" " ")' to clean up all dead processes if any. My concern is to mitigate this issue from root cause as I already updated newer version on the OMSAgent and associated dependencies.