openshift / openshift-azure

Azure Red Hat Openshift
https://azure.microsoft.com/en-us/services/openshift/
Apache License 2.0
49 stars 51 forks source link

node vm builds fail with dict object' has no attribute 'properties' #591

Closed charlesakalugwu closed 5 years ago

charlesakalugwu commented 6 years ago

The build node vm ci-operator jobs are failing with the following errors

rhel7 example

TASK [start copy] **************************************************************
Wednesday 10 October 2018  15:32:37 +0000 (0:00:36.186)       0:13:47.515 ***** 
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'properties'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/azure/openshift-cluster/tasks/create_blob_from_vm.yml': line 23, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: start copy\n  ^ here\n"}

artifacts: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/azure-build-node-image-rhel-310/61/

centos7 example

TASK [start copy] **************************************************************
Wednesday 10 October 2018  13:22:14 +0000 (0:00:34.831)       0:14:30.644 ***** 
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'properties'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/azure/openshift-cluster/tasks/create_blob_from_vm.yml': line 23, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: start copy\n  ^ here\n"}

artifacts: http://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/azure-build-node-image-centos-310/1/

The failure arises in the create_blob_from_vm.yml file https://github.com/openshift/openshift-ansible/blob/32bd8205276fb3824b3f366f726c88ede5adc259/playbooks/azure/openshift-cluster/tasks/create_blob_from_vm.yml#L26

It looks like the previous az disk grant-access command in the playbook doesn't contain the expected sas json (it should contain a properties.output.accessSAS key).

It might be worth updating the version of the azure cli used in our ci-operator job step images. The latest version of the cli (2.0.47) was released two days ago (9th October) and it has an interesting entry in the changelog at https://docs.microsoft.com/en-us/cli/azure/release-notes-azure-cli?view=azure-cli-latest#vm

Fixed empty accessSas field in disk grant-access

Its is interesting that the version of the cli installed in our buildimage container is 2.0.46. Upgrading to the latest az cli version could fix this issue.

If this doesn't work then the only other way the output of the az disk grant-access command would be empty is if it is called on a disk that is currently still attached to a running VM. That means we would need to check the playbooks to make sure that the VM is switched off before performing az disk grant-access

charlesakalugwu commented 6 years ago

There was a recent update to the create_blob_from_vm.yml playbook which tried to workaround the missing accessSas field of az disk grant-access https://github.com/openshift/openshift-ansible/commit/19cb7aee31c846ee44905ab3457ad5562c50fa35

0xmichalis commented 6 years ago

Unfortunately, the az version used inside openshift-ansible is determined by when a build for the installer image has run

https://github.com/openshift/openshift-ansible/blob/5bfd3d76f8358f8f9fb40edaa124602c869c9676/images/installer/Dockerfile#L13

As a first step, can we pin its version down to a stable one so we stop these kinds of breakages? Long term we should move out of openshift-ansible, it's really a pita.

charlesakalugwu commented 6 years ago

Just did some manual tests with the 2.0.46 and 2.0.47 versions of the az cli, trying to simulate the workflow encoded in the affected playbook

version 2.0.46

$ az -v
azure-cli (2.0.46)

acr (2.1.5)
acs (2.3.4)
advisor (0.6.0)
ams (0.2.3)
appservice (0.2.4)
backup (1.2.1)
batch (3.4.0)
batchai (0.4.3)
billing (0.2.0)
botservice (0.1.1)
cdn (0.1.1)
cloud (2.1.0)
cognitiveservices (0.2.3)
command-modules-nspkg (2.0.2)
configure (2.0.18)
consumption (0.4.0)
container (0.3.4)
core (2.0.46)
cosmosdb (0.2.1)
dla (0.2.3)
dls (0.1.3)
dms (0.1.1)
eventgrid (0.2.0)
eventhubs (0.2.4)
extension (0.2.1)
feedback (2.1.4)
find (0.2.12)
interactive (0.3.30)
iot (0.3.2)
iotcentral (0.1.2)
keyvault (2.2.3)
lab (0.1.1)
maps (0.3.2)
monitor (0.2.3)
network (2.2.5)
nspkg (3.0.3)
policyinsights (0.1.0)
profile (2.1.1)
rdbms (0.3.2)
redis (0.3.2)
relay (0.1.2)
reservations (0.4.0)
resource (2.1.4)
role (2.1.5)
search (0.1.1)
servicebus (0.2.3)
servicefabric (0.1.3)
signalr (1.0.0)
sql (2.1.4)
storage (2.2.2)
telemetry (1.0.0)
vm (2.2.3)

Python location '/usr/lib64/az/bin/python'
Extensions directory '/home/charles/.azure/cliextensions'

Python (Linux) 2.7.15 (default, Sep 21 2018, 23:26:48) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]

Legal docs and information: aka.ms/AzureCliLegal

playbook steps with version 2.0.46

mkdir foo && cd foo
az vm show -g charlesakalugwu-dev -n vm > az.vm.show.json
az storage account keys list -n openshiftimages -g images > az.storage.account.json
az disk grant-access --ids $(cat az.vm.show.json | jq .storageProfile.osDisk.managedDisk.id --raw-output) --duration-in-seconds 60 > az.disk.grant.json
cat az.disk.grant.json
cd .. && rm -rf foo

response to the az disk grant-access for 2.0.46 :

{
  "accessSas": null,
  "endTime": "2018-10-10T23:42:17.6474612+00:00",
  "name": "d2060e48-2eb2-4e7c-b832-9f9c6f321f03",
  "properties": {
    "output": {
      "accessSAS": "xxxxxx"
    }
  },
  "startTime": "2018-10-10T23:42:17.428687+00:00",
  "status": "Succeeded"
}

version 2.0.47

$ az -v
azure-cli (2.0.47)

acr (2.1.6)
acs (2.3.6)
advisor (0.6.0)
ams (0.2.3)
appservice (0.2.5)
backup (1.2.1)
batch (3.4.0)
batchai (0.4.3)
billing (0.2.0)
botservice (0.1.1)
cdn (0.1.1)
cloud (2.1.0)
cognitiveservices (0.2.3)
command-modules-nspkg (2.0.2)
configure (2.0.18)
consumption (0.4.0)
container (0.3.5)
core (2.0.47)
cosmosdb (0.2.1)
dla (0.2.3)
dls (0.1.3)
dms (0.1.1)
eventgrid (0.2.0)
eventhubs (0.3.0)
extension (0.2.2)
feedback (2.1.4)
find (0.2.12)
hdinsight (0.1.0)
interactive (0.3.30)
iot (0.3.3)
iotcentral (0.1.2)
keyvault (2.2.4)
lab (0.1.1)
maps (0.3.2)
monitor (0.2.4)
network (2.2.6)
nspkg (3.0.3)
policyinsights (0.1.0)
profile (2.1.1)
rdbms (0.3.2)
redis (0.3.2)
relay (0.1.2)
reservations (0.4.0)
resource (2.1.4)
role (2.1.7)
search (0.1.1)
servicebus (0.3.0)
servicefabric (0.1.4)
signalr (1.0.0)
sql (2.1.4)
storage (2.2.2)
telemetry (1.0.0)
vm (2.2.4)

Python location '/opt/az/bin/python3'
Extensions directory '/home/charles/.azure/cliextensions'

Python (Linux) 3.6.5 (default, Oct  4 2018, 05:49:33) 
[GCC 7.3.0]

Legal docs and information: aka.ms/AzureCliLegal

playbook steps with version 2.0.47

mkdir foo && cd foo
az vm show -g charlesakalugwu-dev -n vm > az.vm.show.json
az storage account keys list -n openshiftimages -g images > az.storage.account.json
az disk grant-access --ids $(cat az.vm.show.json | jq .storageProfile.osDisk.managedDisk.id --raw-output) --duration-in-seconds 60 > az.disk.grant.json
cat az.disk.grant.json
cd .. && rm -rf foo

response for the az disk grant-access:

{
  "accessSas": "xxxxxx"
}