wls-eng / arm-oraclelinux-wls

Microsoft Azure ARM Templates to create Oracle Linux VM with pre-installed Weblogic Server
Apache License 2.0
0 stars 7 forks source link

NetworkManager disabled on eth0. Modify base images to enable it. #169

Closed edburns closed 4 years ago

edburns commented 4 years ago

There are sporadic CI/CD failures that have been traced to the following root cause:

NetworkManager was disabled on eth0.

Suggested remedy:

1.  Change the file /etc/sysconfig/network-scripts/ifcfg-eth0 and change it to :  NM_CONTROLLED=yes
2.     Recapture the image

Detailed Analysis

Hope you are doing well. Thanks for your time over remote session. That was a pleasant talk with you. Regarding your issue, I have figured out the root cause. Please check it below:

  1. I have checked the VM adminservervm and found out the computer name setting was correct during the deployment. Invoking action VirtualMachines.ResourceOperation.PUT(subscriptionId=685ba005-af8d-4b04-8f16-a7bf38b2eb5a, resourceGroupName=wls-270132377-281-owls-122130-8u131-ol74, resourceName=adminServerVM, vm={ "location": "eastus", "plan": { "name": "owls-122130-8u131-ol74", "publisher": "oracle", "product": "weblogic-122130-jdk8u131-ol74" }, "properties": { "hardwareProfile": { "vmSize": "Standard_A3" }, "storageProfile": { "imageReference": { "publisher": "oracle", "offer": "weblogic-122130-jdk8u131-ol74", "sku": "owls-122130-8u131-ol74", "version": "latest" }, "osDisk": { "createOption": "FromImage", "managedDisk": { "storageAccountType": "Standard_LRS" } }, "dataDisks": [ { "lun": 0, "createOption": "FromImage", "managedDisk": { "storageAccountType": "Standard_LRS" }, "diskSizeGB": 900 } ] }, "osProfile": { "computerName": "adminServerVM", "adminUsername": "weblogic" }, "networkProfile": {"networkInterfaces":[{"id":"/subscriptions/685ba005-af8d-4b04-8f16-a7bf38b2eb5a/resourceGroups/wls-270132377-281-owls-122130-8u131-ol74/providers/Microsoft.Network/networkInterfaces/adminServerVMNIC"}]}, "diagnosticsProfile": { "bootDiagnostics": { "enabled": true, "storageUri": "https://593261olvm.blob.core.windows.net/" } }, "provisioningState": 0 } })

  2. I checked the system log and found out the file /etc/hostname was modified after the VM restart.

Image attachment 1

  1. From the system log, we could see the hostname was changed by NetworkManager.

Image attachment 2

  1. I have reproduced the issue using the Oracle 7.4 image. That means the issue is from the official image not from your end.
  2. After discussing with Linux SME Frank, we finally found out the root cause is NetworkManager was disabled on eth0.

Action Plan:

  1. Change the file /etc/sysconfig/network-scripts/ifcfg-eth0 and change it to : NM_CONTROLLED=yes
  2. Re-capture the image.
edburns commented 4 years ago

image

edburns commented 4 years ago

image

edburns commented 4 years ago

From @galiacheng đź‘Ť

Update for new findings:

The “managedservervm2” that we investigated on was able to access “adminservervm” with hostname “adminservervm” after a restart.
Because after restart,  /etc/resolv.conf in “managedservervm2” was appended with a new search option “internal.cloudapp.net”.

To make “adminservervm.internal.cloudapp.net” visible to managedservervm2 DNS resolver, we can add the search option “internal.cloudapp.net” to /etc/resolv.conf, without a restart.
The DNS resolver will append the suffix to the hostname automatically when managedservervm2 accesses “adminservervm”.

From Microsoft.Network networkInterfaces template reference, we can specify “internalDomainNameSuffix” in the ARM template for internal VMs communication in the same network.
After I specified “internalDomainNameSuffix” with value “internal.cloudapp.net” and deployed the ARM template, /etc/resolv.conf  in VMs were updated with search option “internal.cloudapp.net”.
The adminservervm with hostname “adminservervm.internal.cloudapp.net” is available even connecting with hostname “adminservervm” from other VMs.

As it requires to capture a new images by changing file /etc/sysconfig/network-scripts/ifcfg-eth0, I will specify “internalDomainNameSuffix” from template to solve that error.

Your quick debugging into root cause help me a lot to find out the setting.
Feel free to call me if you need more information.
Thank you again.

Regards,
Galia
jacobt123 commented 4 years ago

Offer ID: weblogic-122130-jdk8u131-ol74 file /etc/sysconfig/network-scripts/ifcfg-eth0 and changed it to : NM_CONTROLLED=yes captured new image and published on partner center