microsoft / omi

Open Management Infrastructure
Other
360 stars 114 forks source link

Memory leak in omiagent #718

Closed salindaliyanage closed 2 years ago

salindaliyanage commented 2 years ago

We have re-onboaded multiple Linux VMs into a new Azure Log Analytics workspace recently. Prior to this, these VMs were part of the default log analytics workspaces set by Microsoft Defender for Cloud using following package versions.

omi-1.6.8-1.x86_64
omsagent-1.13.40-0.x86_64
omsconfig-1.1.1-930.x86_64
scx-1.6.4-7.x86_64

We noticed that the re-onboading process re-deployed following latest package versions.

omi-1.6.9-1.x86_64
omsagent-1.14.12-0.x86_64
omsconfig-1.1.1-932.x86_64
scx-1.6.9-1.x86_64

Post re-onboarding, omiagent process on every VM starts accumulating memory until omiagent process is killed by "Out of memory". This problem was apparent on all the VMs and downgrading the packages to previous versions resolved the problem.

The latest onboard_agent.sh script (https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh) uses the OMSAgent_v1.14.12-0 which installs the OMI_1.6.9-1 version.

Can you please review if the latest version of OMI agent hits by a memory leak condition? Thanks.

JumpingYang001 commented 2 years ago

@salindaliyanage thanks for filing it! we are investigating it.

jihu commented 2 years ago

I think we notice the same problem in our production environment. omiagent currently uses about 1.6GB, after server initiation 8 days ago. I saw the old ticket 597 where you Jumping suggested running strace, ltrace and pmap. strace and ltrace gave me nothing useful, but pmap gave me this:

4696: /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING 0000000000400000 736K r-x-- omiagent 00000000006b8000 40K rw--- omiagent 00000000006c2000 132K rw--- [ anon ] 0000000000992000 1352476K rw--- [ anon ] 00007f8fd4000000 132K rw--- [ anon ] 00007f8fd4021000 65404K ----- [ anon ] 00007f8fd8000000 132K rw--- [ anon ] 00007f8fd8021000 65404K ----- [ anon ] 00007f8fdc000000 132K rw--- [ anon ] 00007f8fdc021000 65404K ----- [ anon ] 00007f8fe0000000 132K rw--- [ anon ] 00007f8fe0021000 65404K ----- [ anon ] 00007f8fe4000000 684K rw--- [ anon ] 00007f8fe40ab000 64852K ----- [ anon ] 00007f8fe8757000 4K ----- [ anon ] 00007f8fe8758000 252K rw--- [ anon ] 00007f8fe8797000 4K ----- [ anon ] 00007f8fe8798000 252K rw--- [ anon ] 00007f8fe87d7000 4K ----- [ anon ] 00007f8fe87d8000 252K rw--- [ anon ] 00007f8fe8817000 92K r-x-- libresolv-2.27.so 00007f8fe882e000 2044K ----- libresolv-2.27.so 00007f8fe8a2d000 4K r---- libresolv-2.27.so 00007f8fe8a2e000 4K rw--- libresolv-2.27.so 00007f8fe8a2f000 8K rw--- [ anon ] 00007f8fe8a31000 20K r-x-- libnss_dns-2.27.so 00007f8fe8a36000 2048K ----- libnss_dns-2.27.so 00007f8fe8c36000 4K r---- libnss_dns-2.27.so 00007f8fe8c37000 4K rw--- libnss_dns-2.27.so 00007f8fe8c38000 44K r-x-- libnss_files-2.27.so 00007f8fe8c43000 2044K ----- libnss_files-2.27.so 00007f8fe8e42000 4K r---- libnss_files-2.27.so 00007f8fe8e43000 4K rw--- libnss_files-2.27.so 00007f8fe8e44000 24K rw--- [ anon ] 00007f8fe8e4a000 4K ----- [ anon ] 00007f8fe8e4b000 252K rw--- [ anon ] 00007f8fe8e8a000 4K ----- [ anon ] 00007f8fe8e8b000 252K rw--- [ anon ] 00007f8fe8eca000 1484K r---- LC_COLLATE 00007f8fe903d000 92K r-x-- libgcc_s.so.1 00007f8fe9054000 2044K ----- libgcc_s.so.1 00007f8fe9253000 4K r---- libgcc_s.so.1 00007f8fe9254000 4K rw--- libgcc_s.so.1 00007f8fe9255000 1652K r-x-- libm-2.27.so 00007f8fe93f2000 2044K ----- libm-2.27.so 00007f8fe95f1000 4K r---- libm-2.27.so 00007f8fe95f2000 4K rw--- libm-2.27.so 00007f8fe95f3000 1508K r-x-- libstdc++.so.6.0.25 00007f8fe976c000 2048K ----- libstdc++.so.6.0.25 00007f8fe996c000 40K r---- libstdc++.so.6.0.25 00007f8fe9976000 8K rw--- libstdc++.so.6.0.25 00007f8fe9978000 16K rw--- [ anon ] 00007f8fe997c000 28K r-x-- librt-2.27.so 00007f8fe9983000 2044K ----- librt-2.27.so 00007f8fe9b82000 4K r---- librt-2.27.so 00007f8fe9b83000 4K rw--- librt-2.27.so 00007f8fe9b84000 36K r-x-- libcrypt-2.27.so 00007f8fe9b8d000 2044K ----- libcrypt-2.27.so 00007f8fe9d8c000 4K r---- libcrypt-2.27.so 00007f8fe9d8d000 4K rw--- libcrypt-2.27.so 00007f8fe9d8e000 184K rw--- [ anon ] 00007f8fe9dbc000 152K r-x-- libmicxx.so 00007f8fe9de2000 2048K ----- libmicxx.so 00007f8fe9fe2000 4K rw--- libmicxx.so 00007f8fe9fe3000 132K rw--- [ anon ] 00007f8fea004000 2524K r-x-- libSCXCoreProviderModule.so 00007f8fea27b000 2048K ----- libSCXCoreProviderModule.so 00007f8fea47b000 124K rw--- libSCXCoreProviderModule.so 00007f8fea49a000 16K rw--- [ anon ] 00007f8fea49e000 16K r-x-- libcap-ng.so.0.0.0 00007f8fea4a2000 2044K ----- libcap-ng.so.0.0.0 00007f8fea6a1000 4K r---- libcap-ng.so.0.0.0 00007f8fea6a2000 4K rw--- libcap-ng.so.0.0.0 00007f8fea6a3000 116K r-x-- libaudit.so.1.0.0 00007f8fea6c0000 2048K ----- libaudit.so.1.0.0 00007f8fea8c0000 4K r---- libaudit.so.1.0.0 00007f8fea8c1000 4K rw--- libaudit.so.1.0.0 00007f8fea8c2000 40K rw--- [ anon ] 00007f8fea8cc000 1948K r-x-- libc-2.27.so 00007f8feaab3000 2048K ----- libc-2.27.so 00007f8feacb3000 16K r---- libc-2.27.so 00007f8feacb7000 8K rw--- libc-2.27.so 00007f8feacb9000 16K rw--- [ anon ] 00007f8feacbd000 2668K r-x-- libcrypto.so.1.1 (deleted) 00007f8feaf58000 2044K ----- libcrypto.so.1.1 (deleted) 00007f8feb157000 176K r---- libcrypto.so.1.1 (deleted) 00007f8feb183000 8K rw--- libcrypto.so.1.1 (deleted) 00007f8feb185000 12K rw--- [ anon ] 00007f8feb188000 516K r-x-- libssl.so.1.1 (deleted) 00007f8feb209000 2044K ----- libssl.so.1.1 (deleted) 00007f8feb408000 36K r---- libssl.so.1.1 (deleted) 00007f8feb411000 16K rw--- libssl.so.1.1 (deleted) 00007f8feb415000 52K r-x-- libpam.so.0.83.1 00007f8feb422000 2044K ----- libpam.so.0.83.1 00007f8feb621000 4K r---- libpam.so.0.83.1 00007f8feb622000 4K rw--- libpam.so.0.83.1 00007f8feb623000 12K r-x-- libdl-2.27.so 00007f8feb626000 2044K ----- libdl-2.27.so 00007f8feb825000 4K r---- libdl-2.27.so 00007f8feb826000 4K rw--- libdl-2.27.so 00007f8feb827000 104K r-x-- libpthread-2.27.so 00007f8feb841000 2044K ----- libpthread-2.27.so 00007f8feba40000 4K r---- libpthread-2.27.so 00007f8feba41000 4K rw--- libpthread-2.27.so 00007f8feba42000 16K rw--- [ anon ] 00007f8feba46000 164K r-x-- ld-2.27.so 00007f8feba83000 4K r---- LC_IDENTIFICATION 00007f8feba84000 4K r---- LC_MEASUREMENT 00007f8feba85000 4K r---- LC_TELEPHONE 00007f8feba86000 4K r---- LC_ADDRESS 00007f8feba87000 4K r---- LC_NAME 00007f8feba88000 4K r---- LC_PAPER 00007f8feba89000 4K r---- SYS_LC_MESSAGES 00007f8feba8a000 4K r---- LC_MONETARY 00007f8feba8b000 4K r---- LC_TIME 00007f8feba8c000 4K r---- LC_NUMERIC 00007f8feba8d000 28K r--s- gconv-modules.cache 00007f8feba94000 196K r---- LC_CTYPE 00007f8febac5000 1644K r---- locale-archive 00007f8febc60000 28K rw--- [ anon ] 00007f8febc6f000 4K r---- ld-2.27.so 00007f8febc70000 4K rw--- ld-2.27.so 00007f8febc71000 4K rw--- [ anon ] 00007fff70cca000 132K rw--- [ stack ] 00007fff70dfc000 12K r---- [ anon ] 00007fff70dff000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 1735484K

The interesting line is "0000000000992000 1352476K rw--- [ anon ]" which takes up the majority of the memory. Is there a way to check what is causing that usage?

JumpingYang001 commented 2 years ago

@salindaliyanage @jihu we have found the root cause, dev is fixing the issue.

confusedfella commented 2 years ago

@JumpingYang001 Is there an ETA on this fix? Currently having the same behavior.

JumpingYang001 commented 2 years ago

@confusedfella customer can update the fixes in SCX 1.6.9-2 from MSRepo. You can refer below link also https://docs.microsoft.com/en-us/windows-server/administration/Linux-Package-Repository-for-Microsoft-Software

nicon89 commented 2 years ago

I have same issue. obraz

nicon89 commented 2 years ago

I was unable to find omi in version newer than 1.6.9.1 on Ubuntu 20.04:

omi is already the newest version (1.6.9.1).
JumpingYang001 commented 2 years ago

@nicon89 omi no update and keep 1.6.9.1, SCX updated to 1.6.9-2, you can try apt upgrade scx.

nicon89 commented 2 years ago

Thank you. That helped.

dsuresh-ap commented 2 years ago

Hi, I think we have the same issue where over a week our 4gb linux vms run out of memory. It looks like our scx version is already on the latest so not sure what else we can do.

image

Update: ran the upgrade either way but also realized there are two versions of the package. How can we tell which is being used by omiagent? image

JumpingYang001 commented 2 years ago

@dsuresh-ap run this command: dpkg -l|grep scx

dsuresh-ap commented 2 years ago

dpkg -l|grep scx

I see, there is only one package and it is 1.6.9.2 on the VM that I updated and 1.6.9.1 on the VMs that are not updated. Will update image

JumpingYang001 commented 2 years ago

@dsuresh-ap once update the new 1.6.9-2, you can check if it fixes the issue or not, thanks.

deepakjain111 commented 2 years ago

https://github.com/microsoft/SCXcore/releases/tag/v1.6.9-2

eduardopaloma commented 2 years ago

Hi Team, Currently experiencing this issue on one of our production servers after upgrading. I'm planning to down grade this to 1.6.8-1 to make it uniform with the rest of the servers.

I've research how to downgrade this and what I found is this command

yum downgrade scx

Will this downgrade to version 1.6.8-1? Is there any way to specify the version I'm intending to downgrade? Any impact during the downgrade as this is on production?

I'm very new to Linux environment and would appreciate your advise on how to downgrade this.

Cheers

Repository 'epel' is missing name in configuration, using id omi.x86_64 1.6.9-1 installed
omsagent.x86_64 1.14.11-0 installed
omsconfig.x86_64 1.1.1-932 installed scx.x86_64 1.6.9-1 installed

JumpingYang001 commented 2 years ago

@eduardopaloma you should upgrade scx to 1.6.9-2 instead of downgrade... BTW, omi\scx is in MS Repo, not 'epel' repo.

eduardopaloma commented 2 years ago

@eduardopaloma you should upgrade scx to 1.6.9-2 instead of downgrade... BTW, omi\scx is in MS Repo, not 'epel' repo.

Thanks for your reply. Question though, is there any impact during the upgrade like restart on the server? Reason is that this is in production.

JumpingYang001 commented 2 years ago

@eduardopaloma should no impact.

duchenpaul commented 1 year ago

I see the similiar issue in OMI-1.6.9-1

[ 02:16 - 10.73.6.36  ]  
chend@ct-dev-eun-sonar-es-01 ~ $ /opt/omi/bin/omiagent --version
/opt/omi/bin/omiagent: OMI-1.6.9-1 - Tue Dec 21 08:51:40 PST 2021
[ 02:17 - 10.73.6.36  ]  
chend@ct-dev-eun-sonar-es-01 ~ $ sudo service omid status
● omid.service - OMI CIM Server
     Loaded: loaded (/lib/systemd/system/omid.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-07-11 18:19:07 UTC; 3 weeks 2 days ago
   Main PID: 987 (omiserver)
      Tasks: 8 (limit: 33683)
     Memory: 3.7G
     CGroup: /system.slice/omid.service
             ├─ 987 /opt/omi/bin/omiserver -d
             ├─ 989 /opt/omi/bin/omiengine -d --logfilefd 3 --socketpair 9
             ├─1611 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
             └─1735 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING

$ ps -o pid,user,%mem,command ax | sort -b -k3 -r
    PID USER     %MEM COMMAND
   1735 root     13.8 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
I used pmap to check, here ``` chend@ct-dev-eun-sonar-es-01 ~ $ sudo pmap 1735 1735: /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING 0000000000400000 736K r-x-- omiagent 00000000006b8000 40K rw--- omiagent 00000000006c2000 132K rw--- [ anon ] 0000000000a0c000 3968504K rw--- [ anon ] 00007feb44000000 132K rw--- [ anon ] 00007feb44021000 65404K ----- [ anon ] 00007feb4c000000 856K rw--- [ anon ] 00007feb4c0d6000 64680K ----- [ anon ] 00007feb52b75000 4K ----- [ anon ] 00007feb52b76000 252K rw--- [ anon ] 00007feb52bb5000 4K ----- [ anon ] 00007feb52bb6000 252K rw--- [ anon ] 00007feb52bf5000 4K r---- LC_IDENTIFICATION 00007feb52bf6000 4K r---- LC_MEASUREMENT 00007feb52bf7000 4K r---- LC_TELEPHONE 00007feb52bf8000 4K r---- LC_ADDRESS 00007feb52bf9000 4K r---- LC_NAME 00007feb52bfa000 4K r---- LC_PAPER 00007feb52bfb000 4K r---- SYS_LC_MESSAGES 00007feb52bfc000 4K r---- LC_MONETARY 00007feb52bfd000 1484K r---- LC_COLLATE 00007feb52d70000 4K r---- LC_TIME 00007feb52d71000 28K r--s- gconv-modules.cache 00007feb52d78000 200K r---- LC_CTYPE 00007feb52daa000 2968K r---- locale-archive 00007feb53090000 12K r---- libgcc_s.so.1 00007feb53093000 72K r-x-- libgcc_s.so.1 00007feb530a5000 16K r---- libgcc_s.so.1 00007feb530a9000 4K r---- libgcc_s.so.1 00007feb530aa000 4K rw--- libgcc_s.so.1 00007feb530ab000 52K r---- libm-2.31.so 00007feb530b8000 668K r-x-- libm-2.31.so 00007feb5315f000 612K r---- libm-2.31.so 00007feb531f8000 4K r---- libm-2.31.so 00007feb531f9000 4K rw--- libm-2.31.so 00007feb531fa000 600K r---- libstdc++.so.6.0.28 00007feb53290000 964K r-x-- libstdc++.so.6.0.28 00007feb53381000 292K r---- libstdc++.so.6.0.28 00007feb533ca000 4K ----- libstdc++.so.6.0.28 00007feb533cb000 44K r---- libstdc++.so.6.0.28 00007feb533d6000 12K rw--- libstdc++.so.6.0.28 00007feb533d9000 12K rw--- [ anon ] 00007feb533dc000 8K r---- librt-2.31.so 00007feb533de000 16K r-x-- librt-2.31.so 00007feb533e2000 8K r---- librt-2.31.so 00007feb533e4000 4K r---- librt-2.31.so 00007feb533e5000 4K rw--- librt-2.31.so 00007feb533e6000 8K r---- libcrypt.so.1.1.0 00007feb533e8000 84K r-x-- libcrypt.so.1.1.0 00007feb533fd000 104K r---- libcrypt.so.1.1.0 00007feb53417000 4K r---- libcrypt.so.1.1.0 00007feb53418000 4K rw--- libcrypt.so.1.1.0 00007feb53419000 32K rw--- [ anon ] 00007feb53421000 152K r-x-- libmicxx.so 00007feb53447000 2048K ----- libmicxx.so 00007feb53647000 4K rw--- libmicxx.so 00007feb53648000 132K rw--- [ anon ] 00007feb53669000 2524K r-x-- libSCXCoreProviderModule.so 00007feb538e0000 2048K ----- libSCXCoreProviderModule.so 00007feb53ae0000 124K rw--- libSCXCoreProviderModule.so 00007feb53aff000 24K rw--- [ anon ] 00007feb53b05000 8K r---- libcap-ng.so.0.0.0 00007feb53b07000 12K r-x-- libcap-ng.so.0.0.0 00007feb53b0a000 4K r---- libcap-ng.so.0.0.0 00007feb53b0b000 4K r---- libcap-ng.so.0.0.0 00007feb53b0c000 4K rw--- libcap-ng.so.0.0.0 00007feb53b0d000 12K r---- libaudit.so.1.0.0 00007feb53b10000 32K r-x-- libaudit.so.1.0.0 00007feb53b18000 80K r---- libaudit.so.1.0.0 00007feb53b2c000 4K ----- libaudit.so.1.0.0 00007feb53b2d000 4K r---- libaudit.so.1.0.0 00007feb53b2e000 4K rw--- libaudit.so.1.0.0 00007feb53b2f000 48K rw--- [ anon ] 00007feb53b3b000 136K r---- libc-2.31.so 00007feb53b5d000 1504K r-x-- libc-2.31.so 00007feb53cd5000 312K r---- libc-2.31.so 00007feb53d23000 16K r---- libc-2.31.so 00007feb53d27000 8K rw--- libc-2.31.so 00007feb53d29000 16K rw--- [ anon ] 00007feb53d2d000 480K r---- libcrypto.so.1.1 00007feb53da5000 1644K r-x-- libcrypto.so.1.1 00007feb53f40000 580K r---- libcrypto.so.1.1 00007feb53fd1000 176K r---- libcrypto.so.1.1 00007feb53ffd000 8K rw--- libcrypto.so.1.1 00007feb53fff000 16K rw--- [ anon ] 00007feb54003000 112K r---- libssl.so.1.1 00007feb5401f000 316K r-x-- libssl.so.1.1 00007feb5406e000 104K r---- libssl.so.1.1 00007feb54088000 4K ----- libssl.so.1.1 00007feb54089000 36K r---- libssl.so.1.1 00007feb54092000 16K rw--- libssl.so.1.1 00007feb54096000 12K r---- libpam.so.0.84.2 00007feb54099000 36K r-x-- libpam.so.0.84.2 00007feb540a2000 16K r---- libpam.so.0.84.2 00007feb540a6000 4K r---- libpam.so.0.84.2 00007feb540a7000 4K rw--- libpam.so.0.84.2 00007feb540a8000 4K r---- libdl-2.31.so 00007feb540a9000 8K r-x-- libdl-2.31.so 00007feb540ab000 4K r---- libdl-2.31.so 00007feb540ac000 4K r---- libdl-2.31.so 00007feb540ad000 4K rw--- libdl-2.31.so 00007feb540ae000 24K r---- libpthread-2.31.so 00007feb540b4000 68K r-x-- libpthread-2.31.so 00007feb540c5000 24K r---- libpthread-2.31.so 00007feb540cb000 4K r---- libpthread-2.31.so 00007feb540cc000 4K rw--- libpthread-2.31.so 00007feb540cd000 24K rw--- [ anon ] 00007feb540db000 4K r---- ld-2.31.so 00007feb540dc000 140K r-x-- ld-2.31.so 00007feb540ff000 32K r---- ld-2.31.so 00007feb54107000 4K r---- LC_NUMERIC 00007feb54108000 4K r---- ld-2.31.so 00007feb54109000 4K rw--- ld-2.31.so 00007feb5410a000 4K rw--- [ anon ] 00007ffcc20b5000 132K rw--- [ stack ] 00007ffcc2161000 16K r---- [ anon ] 00007ffcc2165000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 4122716K ```
JumpingYang001 commented 1 year ago

@duchenpaul your show issue is another issue and it is fixed in scx-1.6.9-2: https://github.com/microsoft/SCXcore/releases/tag/v1.6.9-2.

bpkroth commented 1 year ago

Hey folks, this bug is still there on stock Ubuntu 20.04 images in Azure. They come preconfigured to pull from the http://azure.archive.ubuntu.com/ubuntu/ sources, not the https://packages.microsoft.com/ubuntu/... sources referenced documentation listed above.

Any plans to update the azure.archive.ubuntu.com package repository as well so that this fix can be applied more broadly more easily?

JumpingYang001 commented 1 year ago

@bpkroth if you have azure vm has the issue, you can create ticket on portal, so they will transfer to the azure team who managed image.