microsoft / OMS-Agent-for-Linux

http://www.microsoft.com/oms
Other
408 stars 311 forks source link

Linux Hybrid Worker only register in System hybrid worker groups cannot use with runbook #1261

Open desmphil opened 3 years ago

desmphil commented 3 years ago

When I deploy the OMSAgent to a Linux Ubuntu 16 ou 18, using on-premise package or Azure Extension. The machine does register to the Log Analytics and Automation Account ( with Update Management.) The Linux machine is registered to the System Hybrid Worker Groups

However when I try to run the registration on the Linux host for the User Hybrid Worker, the worker is already registered, the registration fails. I cannot register for User Hybrid Worker Groups.

I don't really need the Update Management feature, but I really need an User Hybrid Worker on Linux to send runbook job.

IF I un-register the current worker, and then register again, the Linux host will show User hybrid worker groups and send a registration time only once and then, the worker, will remains disconnected broken and will not heartbeat.

Runbook Job will be Queued forever.

Running python UM_Linux_Troubleshooter_AUM.py script reports everything ok and worker registered. Passed: Microsoft Monitoring agent is running Passed: Machine registered with log analytics workspace:['0b8ca96b-5beb-430e-b71d-3d92708faac6'] Passed: Hybrid worker package is present Passed: Hybrid worker is running Passed: Machine is connected to internet Passed: TCP test for {agentsvc.azure-automation.net} (port 443) succeeded Passed: TCP test for {cc-jobruntimedata-prod-su1.azure-automation.net} (port 443) succeeded Passed: TCP test for {.ods.opinsights.azure.com} (port 443) succeeded Passed: TCP test for {.oms.opinsights.azure.com} (port 443) succeeded Passed: TCP test for {ods.systemcenteradvisor.com} (port 443) succeeded

kamsalisbury commented 3 years ago

Similar issue using CentOS 8, running python3. I was able to onboard the host to Log Analytics without error. I was able to register the host as a hybrid worker without error. The hybrid worker status is green in azure automation. The first time I execute a runbook to the hybrid worker, the hybrid worker stops reporting and the actions on the hybrid worker never execute. I de-registered and then re-registered the hybrid worker. The hybrid worker status is green in azure automation again. Second attempt to runbook the hybrid worker, same results.

Current status; LogAnalytics shows heartbeat good. CentOS8 (ps) shows processes are there. Automation says hybrid worker not seen in 1 day.

ps -ef | grep python; nxautom+ 3470602 1 0 Oct21 ? 00:11:30 python3 /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/3.x/worker/main.py /var/opt/microsoft/omsagent/state/automationworker/oms.conf rworkspace: 1.6.10.6 nxautom+ 3470638 3470602 0 Oct21 ? 01:26:49 python3 /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/3.x/worker/hybridworker.py /var/opt/microsoft/omsagent/state/automationworker/worker.conf managed rworkspace: rversion:1.6.10.6

tail ...omsagent.log; 2020-11-13 09:23:41 -0500 [info]: Sending OMS Heartbeat succeeded at 2020-11-13T14:23:41.868Z 2020-11-13 09:24:41 -0500 [info]: Sending OMS Heartbeat succeeded at 2020-11-13T14:24:41.869Z

tail ...omsconfig.log; 2020/11/13 09:20:44: INFO: /opt/microsoft/omsconfig/Scripts/3.x/Scripts/nxOMSAutomationWorker.py(844): nxautomation was found on the system 2020/11/13 09:20:44: DEBUG: /opt/microsoft/omsconfig/Scripts/3.x/Scripts/nxOMSAutomationWorker.py(844): running version is: 1.6.10.6 2020/11/13 09:20:44: DEBUG: /opt/microsoft/omsconfig/Scripts/3.x/Scripts/nxOMSAutomationWorker.py(844): latest available version is: 1.6.10.6 2020/11/13 09:20:44: DEBUG: /opt/microsoft/omsconfig/Scripts/3.x/Scripts/nxOMSAutomationWorker.py(844): Test_Marshall returned [0] 2020/11/13 09:20:44: FATAL: /opt/microsoft/omsconfig/Scripts/3.x/Scripts/nxOMSAuditdPlugin.py(388): Invalid workspace id

desmphil commented 3 years ago

@kamsalisbury have you onboarder your log analytics and Automation Account with the Update Management, if so you must register the log analytics for AzureAutomation

Set-AzOperationalInsightsIntelligencePack -ResourceGroupName -WorkspaceName -IntelligencePackName "AzureAutomation" -Enabled $true

Then purge the agent on the endpoint, clean azure automation hybrid worker, and restart the whole process, it should finally re-register and then you can execute

kamsalisbury commented 3 years ago

Thank you, I will try these steps and report back. To the PS prompt!

desmphil commented 3 years ago

Thank you, I will try these steps and report back. To the cloud shell!

Worked for me in 2 different tenants. The documentation doesn't refer to a Linked Workspace with Azure Automation when you use the Update Management Solution

Linux need extra care.. I learned the hard way too 👍

kamsalisbury commented 3 years ago

Thank you for guidance. I think the result is that support for CentOS8 has not yet fully arrived (and that is OK. I can wait or install CentOS7 or both.)

Re-executed; Set-AzOperationalInsightsIntelligencePack -ResourceGroupName -WorkspaceName -IntelligencePackName "AzureAutomation" -Enabled $true

Name Enabled


AzureAutomation True

Re-executed; Set-AzOperationalInsightsIntelligencePack -ResourceGroupName -WorkspaceName -IntelligencePackName "Updates" -Enabled $true

Name Enabled


Updates True

De-registered Hybrid Worker; python /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/scripts/onboarding.py --deregister --endpoint="" --key="" --groupname="" --workspaceid="" Successfuly deregistered worker. Cleaning up left over directories. Removed state directory. Removed working directory.

In Azure portal, the Hybrid Worker Group only had the one linux host and no longer shows in "User hybrid worker groups". However, "System hybrid worker groups" shows this same linux host last seen "1 minute ago". I manually deleted this object after the agent purge.

sudo onboard_agent.sh --purge ... Shell bundle exiting with code 0

rm -rf /opt/microsoft

sudo onboard_agent.sh -w -s ... Shell bundle exiting with code 0

Azure Log Analytics shows the agent heartbeat and; Solutions "updates", "azureAutomation", "securityCenterFree"

About 15 minutes later...

sudo python /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/scripts/onboarding.py --register -w -k -g -e [Errno 2] No such file or directory

tail .../omsagent.log 2020-11-13 11:17:07 -0500 [info]: Sending OMS Heartbeat succeeded at 2020-11-13T16:17:07.836Z

tail .../omsconfig.log Error Message Executing Get-Action returned success but didn't return any status. Error Code : 6 2020/11/13 11:16:29: ERROR: null(0): EventId=1 Priority=ERROR Job : This event indicates that failure happens when LCM is trying to get the configuration from pull server using download manager null. ErrorId is 6. ErrorDetail is Executing Get-Action returned success but didn't return any status. [2020/11/13 11:17:08] [357041] [INFO] [0] [/opt/microsoft/omsconfig/Scripts/python3/TestDscConfiguration.py:0] dsc_host lock file is acquired by : TestConfiguration 2020/11/13 11:17:09: ERROR: null(0): EventId=1 Priority=ERROR Job : DSC Engine Error : Error Message Current configuration does not exist. Execute Start-DscConfiguration command with -Path parameter to specify a configuration file and create a current configuration first. Error Code : 6 [2020/11/13 11:17:09] [357041] [INFO] [0] [/opt/microsoft/omsconfig/Scripts/python3/TestDscConfiguration.py:0] dsc_host failed with code = 6

Azure Automation Update Management shows two same name linux hosts now, one not assessed (which is the latest agent install) and the other compliant (from earlier this morning).

sudo /opt/microsoft/omsagent/bin/troubleshooter ... Running troubleshooter in silent mode...

CHECKING INSTALLATION... Checking if running a supported OS version... ERROR(S) FOUND.

================================================================================ ALL ERRORS/WARNINGS ENCOUNTERED: ERROR FOUND: This version of CentOS Linux (8.2.2004) is not supported. Please download 6 or 7.