vanzod / Packer

0 stars 0 forks source link

Ubuntu-HPC 18.04.522200 deployment failure #1

Closed vanzod closed 3 years ago

vanzod commented 3 years ago

When deploying the Ubuntu-HPC 18.04.522200 image ARM fails with:

Deployment failed. Correlation ID: f19fcfce-a0e5-48ea-8fe0-a1405ac8bf1c. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "OSProvisioningInternalError",
        "message": "OS Provisioning failed for VM 'hpc01' due to an internal error: The VM encountered an error during deployment. Please visit https://aka.ms/linuxprovisioningerror for more information on remediation."
      }
    ]
  }
}

The cluster-init.log file shows the following error:

2021-03-18 22:07:24,756 - util.py[DEBUG]: Reading from /sys/class/net/eth1/name_assign_type (quiet=False)
2021-03-18 22:07:24,756 - util.py[DEBUG]: Reading from /sys/class/net/eth0/name_assign_type (quiet=False)
2021-03-18 22:07:24,756 - __init__.py[DEBUG]: Found unstable nic names: ['eth1', 'eth0']; calling udevadm settle
2021-03-18 22:07:24,757 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2021-03-18 22:09:24,900 - util.py[DEBUG]: Waiting for udev events to settle took 120.143 seconds
2021-03-18 22:09:24,900 - handlers.py[DEBUG]: finish: azure-ds/crawl_metadata: FAIL: crawl_metadata
2021-03-18 22:09:24,900 - util.py[DEBUG]: Crawl of metadata service took 120.233 seconds
2021-03-18 22:09:24,900 - azure.py[ERROR]: Could not crawl Azure metadata: Unexpected error while running command.
Command: ['udevadm', 'settle']
Exit code: 1
vanzod commented 3 years ago

Attempt 1 - https://github.com/vanzod/Packer/commit/cf23c31368f6f368584427c2f2c084567ba0fb5d Do not create the 99-disable-network-config.cfg file.

Deployment still fails with the same error.

vanzod commented 3 years ago

Attempt 2 - https://github.com/vanzod/Packer/commit/d1a668211993f65134b9dfbc731c8604c37afe60 Do not create 50-cloud-init.yaml.

The issue persists.

vanzod commented 3 years ago

The error is due to MOFED incompatibility with kernel newer than 5.4.0-1039-azure #41~18.04.1-Ubuntu. The temporary workaround is not to use Ubuntu 18.04 base images newer than Canonical:UbuntuServer:18_04-lts-gen2:18.04.202101290

Reverted settings with commit https://github.com/vanzod/Packer/commit/b80c9f93627487ae833cd4f5fe65d041e167c5ba