Closed srice01 closed 4 years ago
Is there any activity on this? Even after almost a year we are still having these same problems with a number of our nodes.
This is still a very prominent issue in Azure is there seriously no work being put into this anymore? It's a broken tool that's causing production VM's to hit full on space
We have ended up creating a cron job to delete the core files (hopefully frequently enough to avoid HD filling) rather than waiting for a fix from Microsoft that it appears will never come.
@srice01 Are you still having this issue with the latest versions ?
Yes (I am using CentOS 7.6.1810).
[root]# rpm -qa | grep -i omi omi-1.6.2-0.x86_64 [root]# rpm -qa | grep -i scx scx-1.6.3-659.x86_64 [root]# rpm -qa | grep -i walinux WALinuxAgent-2.2.42-1.el7.noarch [root]# rpm -qa | grep -i oms auoms-2.0.0-13.x86_64 omsagent-1.11.0-9.x86_64 omsconfig-1.1.1-926.x86_64
[root]# ls -al /var/opt/omi/run/ total 404888 drwxr-xr-x. 3 omi omi 4096 Sep 23 16:31 . drwxr-xr-x. 8 root root 81 May 30 04:23 .. -rw------- 1 root root 30789632 Sep 23 08:01 core.101250 -rw------- 1 root root 30789632 Sep 23 08:16 core.104089 -rw------- 1 root root 30814208 Sep 23 08:31 core.106826 -rw------- 1 root root 30728192 Sep 23 08:46 core.109601 -rw------- 1 root root 30711808 Sep 23 09:01 core.112330 -rw------- 1 root root 30728192 Sep 23 09:16 core.115170 -rw------- 1 root root 30793728 Sep 23 09:31 core.117975 -rw------- 1 root root 30801920 Sep 23 09:46 core.120825 -rw------- 1 root root 30814208 Sep 23 10:01 core.123533 -rw------- 1 root root 30814208 Sep 23 11:46 core.12592 ...
@srice01 Could you please open a support ticket and tell them to engage me (joburati) ? That way I will be able to follow up with the devs internally and get this issue worked on.
I am assuming you mean for me to create a support ticket in Azure. This is support request 119092422001455.
Thanks @srice01, will get in touch with you via the ticket and try to get this moving.
@srice01 Good news, I could fix the problem on your image.
The issue is that the DSCForLinux extension install version 1.1.1-294 of the dsc package, this version cause omiagent to segfault. Installing version 1.1.1-926 fixes the issue.
All those cases are related to this issue:
I have already submitted a fix to bump up the version of the dsc package:
I am following up with PG internally for them to merge and push the fix:
Meanwhile you can fix the issue by installing the package manually:
wget https://github.com/microsoft/PowerShell-DSC-for-Linux/releases/download/v1.1.1-926/dsc-1.1.1-926.ssl_098.x64.rpm
yum upgrade dsc-1.1.1-926.ssl_098.x64.rpm -y
I hope this helps.
@johanburati - This is indeed good news. Given that the DSCForLinux extension is installed by Azure (not ourselves) I take it your changes are to make sure the fixed version is installed by default in future?
@srice01 yes
Once my patch is merged and a new release of the DSCForLinux extension is pushed by the devs, it will be fixed for good. Until then you will have to bump up the version of the package manually.
If you are having this issue check https://github.com/Azure/azure-linux-extensions/issues/875 for details and solution.
24 hours after installing the update and I have seen no core dumps...So I believe this is now resolved.
Copied over from https://github.com/Microsoft/omi/issues/491 (please see this for full communication on this issue).
On our RM provisioned VMs in Azure we noticed that the root partition is filling up with large numbers of "core.###" files in the /var/opt/omi/run directory.
Further investigation shows segmentation faults (in /var/log/messages) as follows:
Environment information:
Operating System: CentOS Release 7.4.1708 (fully patched, that is, "yum update" shows no updates pending).
So far the workaround has been to write a cron job (!) to periodically wipe the core files but obviously this is not an ideal situation.
Further information from "JumpingYang001":
Following debug info shows omiagent loaded scx provider:
(gdb) info sharedlibrary
(*): Shared library is missing debugging information.
(gdb) The crash is on 0x00007fa58e405e00 which is in /lib64/libnss_dns.so.2, that is same as your segmentation faults in /var/log/messages.
Here are the threads:
Please let me know if any further debug information is required.