Closed salindaliyanage closed 2 years ago
@salindaliyanage thanks for filing it! we are investigating it.
I think we notice the same problem in our production environment. omiagent currently uses about 1.6GB, after server initiation 8 days ago. I saw the old ticket 597 where you Jumping suggested running strace, ltrace and pmap. strace and ltrace gave me nothing useful, but pmap gave me this:
4696: /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING 0000000000400000 736K r-x-- omiagent 00000000006b8000 40K rw--- omiagent 00000000006c2000 132K rw--- [ anon ] 0000000000992000 1352476K rw--- [ anon ] 00007f8fd4000000 132K rw--- [ anon ] 00007f8fd4021000 65404K ----- [ anon ] 00007f8fd8000000 132K rw--- [ anon ] 00007f8fd8021000 65404K ----- [ anon ] 00007f8fdc000000 132K rw--- [ anon ] 00007f8fdc021000 65404K ----- [ anon ] 00007f8fe0000000 132K rw--- [ anon ] 00007f8fe0021000 65404K ----- [ anon ] 00007f8fe4000000 684K rw--- [ anon ] 00007f8fe40ab000 64852K ----- [ anon ] 00007f8fe8757000 4K ----- [ anon ] 00007f8fe8758000 252K rw--- [ anon ] 00007f8fe8797000 4K ----- [ anon ] 00007f8fe8798000 252K rw--- [ anon ] 00007f8fe87d7000 4K ----- [ anon ] 00007f8fe87d8000 252K rw--- [ anon ] 00007f8fe8817000 92K r-x-- libresolv-2.27.so 00007f8fe882e000 2044K ----- libresolv-2.27.so 00007f8fe8a2d000 4K r---- libresolv-2.27.so 00007f8fe8a2e000 4K rw--- libresolv-2.27.so 00007f8fe8a2f000 8K rw--- [ anon ] 00007f8fe8a31000 20K r-x-- libnss_dns-2.27.so 00007f8fe8a36000 2048K ----- libnss_dns-2.27.so 00007f8fe8c36000 4K r---- libnss_dns-2.27.so 00007f8fe8c37000 4K rw--- libnss_dns-2.27.so 00007f8fe8c38000 44K r-x-- libnss_files-2.27.so 00007f8fe8c43000 2044K ----- libnss_files-2.27.so 00007f8fe8e42000 4K r---- libnss_files-2.27.so 00007f8fe8e43000 4K rw--- libnss_files-2.27.so 00007f8fe8e44000 24K rw--- [ anon ] 00007f8fe8e4a000 4K ----- [ anon ] 00007f8fe8e4b000 252K rw--- [ anon ] 00007f8fe8e8a000 4K ----- [ anon ] 00007f8fe8e8b000 252K rw--- [ anon ] 00007f8fe8eca000 1484K r---- LC_COLLATE 00007f8fe903d000 92K r-x-- libgcc_s.so.1 00007f8fe9054000 2044K ----- libgcc_s.so.1 00007f8fe9253000 4K r---- libgcc_s.so.1 00007f8fe9254000 4K rw--- libgcc_s.so.1 00007f8fe9255000 1652K r-x-- libm-2.27.so 00007f8fe93f2000 2044K ----- libm-2.27.so 00007f8fe95f1000 4K r---- libm-2.27.so 00007f8fe95f2000 4K rw--- libm-2.27.so 00007f8fe95f3000 1508K r-x-- libstdc++.so.6.0.25 00007f8fe976c000 2048K ----- libstdc++.so.6.0.25 00007f8fe996c000 40K r---- libstdc++.so.6.0.25 00007f8fe9976000 8K rw--- libstdc++.so.6.0.25 00007f8fe9978000 16K rw--- [ anon ] 00007f8fe997c000 28K r-x-- librt-2.27.so 00007f8fe9983000 2044K ----- librt-2.27.so 00007f8fe9b82000 4K r---- librt-2.27.so 00007f8fe9b83000 4K rw--- librt-2.27.so 00007f8fe9b84000 36K r-x-- libcrypt-2.27.so 00007f8fe9b8d000 2044K ----- libcrypt-2.27.so 00007f8fe9d8c000 4K r---- libcrypt-2.27.so 00007f8fe9d8d000 4K rw--- libcrypt-2.27.so 00007f8fe9d8e000 184K rw--- [ anon ] 00007f8fe9dbc000 152K r-x-- libmicxx.so 00007f8fe9de2000 2048K ----- libmicxx.so 00007f8fe9fe2000 4K rw--- libmicxx.so 00007f8fe9fe3000 132K rw--- [ anon ] 00007f8fea004000 2524K r-x-- libSCXCoreProviderModule.so 00007f8fea27b000 2048K ----- libSCXCoreProviderModule.so 00007f8fea47b000 124K rw--- libSCXCoreProviderModule.so 00007f8fea49a000 16K rw--- [ anon ] 00007f8fea49e000 16K r-x-- libcap-ng.so.0.0.0 00007f8fea4a2000 2044K ----- libcap-ng.so.0.0.0 00007f8fea6a1000 4K r---- libcap-ng.so.0.0.0 00007f8fea6a2000 4K rw--- libcap-ng.so.0.0.0 00007f8fea6a3000 116K r-x-- libaudit.so.1.0.0 00007f8fea6c0000 2048K ----- libaudit.so.1.0.0 00007f8fea8c0000 4K r---- libaudit.so.1.0.0 00007f8fea8c1000 4K rw--- libaudit.so.1.0.0 00007f8fea8c2000 40K rw--- [ anon ] 00007f8fea8cc000 1948K r-x-- libc-2.27.so 00007f8feaab3000 2048K ----- libc-2.27.so 00007f8feacb3000 16K r---- libc-2.27.so 00007f8feacb7000 8K rw--- libc-2.27.so 00007f8feacb9000 16K rw--- [ anon ] 00007f8feacbd000 2668K r-x-- libcrypto.so.1.1 (deleted) 00007f8feaf58000 2044K ----- libcrypto.so.1.1 (deleted) 00007f8feb157000 176K r---- libcrypto.so.1.1 (deleted) 00007f8feb183000 8K rw--- libcrypto.so.1.1 (deleted) 00007f8feb185000 12K rw--- [ anon ] 00007f8feb188000 516K r-x-- libssl.so.1.1 (deleted) 00007f8feb209000 2044K ----- libssl.so.1.1 (deleted) 00007f8feb408000 36K r---- libssl.so.1.1 (deleted) 00007f8feb411000 16K rw--- libssl.so.1.1 (deleted) 00007f8feb415000 52K r-x-- libpam.so.0.83.1 00007f8feb422000 2044K ----- libpam.so.0.83.1 00007f8feb621000 4K r---- libpam.so.0.83.1 00007f8feb622000 4K rw--- libpam.so.0.83.1 00007f8feb623000 12K r-x-- libdl-2.27.so 00007f8feb626000 2044K ----- libdl-2.27.so 00007f8feb825000 4K r---- libdl-2.27.so 00007f8feb826000 4K rw--- libdl-2.27.so 00007f8feb827000 104K r-x-- libpthread-2.27.so 00007f8feb841000 2044K ----- libpthread-2.27.so 00007f8feba40000 4K r---- libpthread-2.27.so 00007f8feba41000 4K rw--- libpthread-2.27.so 00007f8feba42000 16K rw--- [ anon ] 00007f8feba46000 164K r-x-- ld-2.27.so 00007f8feba83000 4K r---- LC_IDENTIFICATION 00007f8feba84000 4K r---- LC_MEASUREMENT 00007f8feba85000 4K r---- LC_TELEPHONE 00007f8feba86000 4K r---- LC_ADDRESS 00007f8feba87000 4K r---- LC_NAME 00007f8feba88000 4K r---- LC_PAPER 00007f8feba89000 4K r---- SYS_LC_MESSAGES 00007f8feba8a000 4K r---- LC_MONETARY 00007f8feba8b000 4K r---- LC_TIME 00007f8feba8c000 4K r---- LC_NUMERIC 00007f8feba8d000 28K r--s- gconv-modules.cache 00007f8feba94000 196K r---- LC_CTYPE 00007f8febac5000 1644K r---- locale-archive 00007f8febc60000 28K rw--- [ anon ] 00007f8febc6f000 4K r---- ld-2.27.so 00007f8febc70000 4K rw--- ld-2.27.so 00007f8febc71000 4K rw--- [ anon ] 00007fff70cca000 132K rw--- [ stack ] 00007fff70dfc000 12K r---- [ anon ] 00007fff70dff000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 1735484K
The interesting line is "0000000000992000 1352476K rw--- [ anon ]" which takes up the majority of the memory. Is there a way to check what is causing that usage?
@salindaliyanage @jihu we have found the root cause, dev is fixing the issue.
@JumpingYang001 Is there an ETA on this fix? Currently having the same behavior.
@confusedfella customer can update the fixes in SCX 1.6.9-2 from MSRepo. You can refer below link also https://docs.microsoft.com/en-us/windows-server/administration/Linux-Package-Repository-for-Microsoft-Software
I have same issue.
I was unable to find omi in version newer than 1.6.9.1 on Ubuntu 20.04:
omi is already the newest version (1.6.9.1).
@nicon89 omi no update and keep 1.6.9.1, SCX updated to 1.6.9-2, you can try apt upgrade scx
.
Thank you. That helped.
Hi, I think we have the same issue where over a week our 4gb linux vms run out of memory. It looks like our scx
version is already on the latest so not sure what else we can do.
Update: ran the upgrade either way but also realized there are two versions of the package. How can we tell which is being used by omiagent?
@dsuresh-ap run this command: dpkg -l|grep scx
dpkg -l|grep scx
I see, there is only one package and it is 1.6.9.2 on the VM that I updated and 1.6.9.1 on the VMs that are not updated. Will update
@dsuresh-ap once update the new 1.6.9-2, you can check if it fixes the issue or not, thanks.
Hi Team, Currently experiencing this issue on one of our production servers after upgrading. I'm planning to down grade this to 1.6.8-1 to make it uniform with the rest of the servers.
I've research how to downgrade this and what I found is this command
yum downgrade scx
Will this downgrade to version 1.6.8-1? Is there any way to specify the version I'm intending to downgrade? Any impact during the downgrade as this is on production?
I'm very new to Linux environment and would appreciate your advise on how to downgrade this.
Cheers
Repository 'epel' is missing name in configuration, using id
omi.x86_64 1.6.9-1 installed
omsagent.x86_64 1.14.11-0 installed
omsconfig.x86_64 1.1.1-932 installed
scx.x86_64 1.6.9-1 installed
@eduardopaloma you should upgrade scx to 1.6.9-2 instead of downgrade... BTW, omi\scx is in MS Repo, not 'epel' repo.
@eduardopaloma you should upgrade scx to 1.6.9-2 instead of downgrade... BTW, omi\scx is in MS Repo, not 'epel' repo.
Thanks for your reply. Question though, is there any impact during the upgrade like restart on the server? Reason is that this is in production.
@eduardopaloma should no impact.
I see the similiar issue in OMI-1.6.9-1
[ 02:16 - 10.73.6.36 ]
chend@ct-dev-eun-sonar-es-01 ~ $ /opt/omi/bin/omiagent --version
/opt/omi/bin/omiagent: OMI-1.6.9-1 - Tue Dec 21 08:51:40 PST 2021
[ 02:17 - 10.73.6.36 ]
chend@ct-dev-eun-sonar-es-01 ~ $ sudo service omid status
● omid.service - OMI CIM Server
Loaded: loaded (/lib/systemd/system/omid.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-07-11 18:19:07 UTC; 3 weeks 2 days ago
Main PID: 987 (omiserver)
Tasks: 8 (limit: 33683)
Memory: 3.7G
CGroup: /system.slice/omid.service
├─ 987 /opt/omi/bin/omiserver -d
├─ 989 /opt/omi/bin/omiengine -d --logfilefd 3 --socketpair 9
├─1611 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
└─1735 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
$ ps -o pid,user,%mem,command ax | sort -b -k3 -r
PID USER %MEM COMMAND
1735 root 13.8 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
@duchenpaul your show issue is another issue and it is fixed in scx-1.6.9-2: https://github.com/microsoft/SCXcore/releases/tag/v1.6.9-2.
Hey folks, this bug is still there on stock Ubuntu 20.04 images in Azure.
They come preconfigured to pull from the http://azure.archive.ubuntu.com/ubuntu/
sources, not the https://packages.microsoft.com/ubuntu/...
sources referenced documentation listed above.
Any plans to update the azure.archive.ubuntu.com
package repository as well so that this fix can be applied more broadly more easily?
@bpkroth if you have azure vm has the issue, you can create ticket on portal, so they will transfer to the azure team who managed image.
We have re-onboaded multiple Linux VMs into a new Azure Log Analytics workspace recently. Prior to this, these VMs were part of the default log analytics workspaces set by Microsoft Defender for Cloud using following package versions.
We noticed that the re-onboading process re-deployed following latest package versions.
Post re-onboarding, omiagent process on every VM starts accumulating memory until omiagent process is killed by "Out of memory". This problem was apparent on all the VMs and downgrading the packages to previous versions resolved the problem.
The latest onboard_agent.sh script (https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh) uses the OMSAgent_v1.14.12-0 which installs the OMI_1.6.9-1 version.
Can you please review if the latest version of OMI agent hits by a memory leak condition? Thanks.