NetworkManager disabled on eth0. Modify base images to enable it.

edburns commented 4 years ago

There are sporadic CI/CD failures that have been traced to the following root cause:

NetworkManager was disabled on eth0.

Suggested remedy:

1.  Change the file /etc/sysconfig/network-scripts/ifcfg-eth0 and change it to :  NM_CONTROLLED=yes
2.     Recapture the image

Detailed Analysis

Hope you are doing well. Thanks for your time over remote session. That was a pleasant talk with you. Regarding your issue, I have figured out the root cause. Please check it below:

I have checked the VM adminservervm and found out the computer name setting was correct during the deployment. Invoking action VirtualMachines.ResourceOperation.PUT(subscriptionId=685ba005-af8d-4b04-8f16-a7bf38b2eb5a, resourceGroupName=wls-270132377-281-owls-122130-8u131-ol74, resourceName=adminServerVM, vm={ "location": "eastus", "plan": { "name": "owls-122130-8u131-ol74", "publisher": "oracle", "product": "weblogic-122130-jdk8u131-ol74" }, "properties": { "hardwareProfile": { "vmSize": "Standard_A3" }, "storageProfile": { "imageReference": { "publisher": "oracle", "offer": "weblogic-122130-jdk8u131-ol74", "sku": "owls-122130-8u131-ol74", "version": "latest" }, "osDisk": { "createOption": "FromImage", "managedDisk": { "storageAccountType": "Standard_LRS" } }, "dataDisks": [ { "lun": 0, "createOption": "FromImage", "managedDisk": { "storageAccountType": "Standard_LRS" }, "diskSizeGB": 900 } ] }, "osProfile": { "computerName": "adminServerVM", "adminUsername": "weblogic" }, "networkProfile": {"networkInterfaces":[{"id":"/subscriptions/685ba005-af8d-4b04-8f16-a7bf38b2eb5a/resourceGroups/wls-270132377-281-owls-122130-8u131-ol74/providers/Microsoft.Network/networkInterfaces/adminServerVMNIC"}]}, "diagnosticsProfile": { "bootDiagnostics": { "enabled": true, "storageUri": "https://593261olvm.blob.core.windows.net/" } }, "provisioningState": 0 } })
I checked the system log and found out the file /etc/hostname was modified after the VM restart.

Image attachment 1

From the system log, we could see the hostname was changed by NetworkManager.

Image attachment 2

I have reproduced the issue using the Oracle 7.4 image. That means the issue is from the official image not from your end.
After discussing with Linux SME Frank, we finally found out the root cause is NetworkManager was disabled on eth0.

Action Plan:

Change the file /etc/sysconfig/network-scripts/ifcfg-eth0 and change it to : NM_CONTROLLED=yes
Re-capture the image.

edburns commented 4 years ago

From @galiacheng 👍

Update for new findings:

The “managedservervm2” that we investigated on was able to access “adminservervm” with hostname “adminservervm” after a restart.
Because after restart,  /etc/resolv.conf in “managedservervm2” was appended with a new search option “internal.cloudapp.net”.

To make “adminservervm.internal.cloudapp.net” visible to managedservervm2 DNS resolver, we can add the search option “internal.cloudapp.net” to /etc/resolv.conf, without a restart.
The DNS resolver will append the suffix to the hostname automatically when managedservervm2 accesses “adminservervm”.

From Microsoft.Network networkInterfaces template reference, we can specify “internalDomainNameSuffix” in the ARM template for internal VMs communication in the same network.
After I specified “internalDomainNameSuffix” with value “internal.cloudapp.net” and deployed the ARM template, /etc/resolv.conf  in VMs were updated with search option “internal.cloudapp.net”.
The adminservervm with hostname “adminservervm.internal.cloudapp.net” is available even connecting with hostname “adminservervm” from other VMs.

As it requires to capture a new images by changing file /etc/sysconfig/network-scripts/ifcfg-eth0, I will specify “internalDomainNameSuffix” from template to solve that error.

Your quick debugging into root cause help me a lot to find out the setting.
Feel free to call me if you need more information.
Thank you again.

Regards,
Galia

jacobt123 commented 4 years ago

Offer ID: weblogic-122130-jdk8u131-ol74 file /etc/sysconfig/network-scripts/ifcfg-eth0 and changed it to : NM_CONTROLLED=yes captured new image and published on partner center

wls-eng / arm-oraclelinux-wls

NetworkManager disabled on eth0. Modify base images to enable it. #169

Detailed Analysis