kubevirt / cloud-image-builder

Scripts and playbooks for creating and testing cloud images containing Kubernetes and KubeVirt
Apache License 2.0
6 stars 9 forks source link

Instance cleanup is timing out on GCP #72

Closed rwsu closed 5 years ago

rwsu commented 6 years ago

From CI log https://jenkins-kubevirt.apps.ci.centos.org/blue/rest/organizations/jenkins/pipelines/cloud-image-builder/branches/PR-71/runs/2/log/?start=0,

[gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] [cloud-image-builder_PR-71-JQTBPZAX4DCIWKVPBIRIXXGVDKM26HY3KRKIPXGHE6LEQA2W3BDQ] Running shell script [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] + ansible-playbook -vvv --private-key gcp-test-centos-cleanup.yml [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] ansible-playbook 2.6.1 [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] config file = /workDir/workspace/cloud-image-builder_PR-71-JQTBPZAX4DCIWKVPBIRIXXGVDKM26HY3KRKIPXGHE6LEQA2W3BDQ/ansible.cfg [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] configured module search path = [u'/workDir/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] ansible python module location = /usr/lib/python2.7/site-packages/ansible [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] executable location = /usr/bin/ansible-playbook [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] python version = 2.7.15 (default, May 16 2018, 17:50:09) [GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] Using /workDir/workspace/cloud-image-builder_PR-71-JQTBPZAX4DCIWKVPBIRIXXGVDKM26HY3KRKIPXGHE6LEQA2W3BDQ/ansible.cfg as config file [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] Parsed /etc/ansible/hosts inventory source with ini plugin [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] [WARNING]: provided hosts list is empty, only localhost is available. Note [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] that the implicit localhost does not match 'all' [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] PLAYBOOK: gcp-test-centos-cleanup.yml ** [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] 1 plays in gcp-test-centos-cleanup.yml [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] PLAY [localhost] * [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] META: ran handlers [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] TASK [Delete the test instance] **** [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] task path: /workDir/workspace/cloud-image-builder_PR-71-JQTBPZAX4DCIWKVPBIRIXXGVDKM26HY3KRKIPXGHE6LEQA2W3BDQ/gcp-test-centos-cleanup.yml:11 [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: default [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> EXEC /bin/sh -c 'echo ~default && sleep 0' [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "echo /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710" && echo ansible-tmp-1540343031.65-22375549268710="echo /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710" ) && sleep 0' [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/google/gce.py [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> PUT /workDir/.ansible/tmp/ansible-local-1622PyY9bx/tmpJu_W2E TO /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710/gce.py [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> EXEC /bin/sh -c 'chmod u+x /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710/ /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710/gce.py && sleep 0' [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710/gce.py && sleep 0' [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] <127.0.0.1> EXEC /bin/sh -c 'rm -f -r /workDir/.ansible/tmp/ansible-tmp-1540343031.65-22375549268710/ > /dev/null 2>&1 && sleep 0' [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] The full traceback is: [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] Traceback (most recent call last): [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] File "/tmp/ansible_6Xe84b/ansible_module_gce.py", line 748, in [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] main() [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] File "/tmp/ansible_6Xe84b/ansible_module_gce.py", line 695, in main [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] module, gce, inames, number, lc_zone, state) [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] File "/tmp/ansible_6Xe84b/ansible_module_gce.py", line 604, in change_instance_state [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] gce.destroy_node(node) [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 6428, in destroy_node [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] self.connection.async_request(request, method='DELETE') [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 787, in async_request [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] (self.timeout)) [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] libcloud.common.types.LibcloudError: <LibcloudError in None 'Job did not complete in 180 seconds'> [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] fatal: [localhost]: FAILED! => { [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] "changed": false, [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_6Xe84b/ansible_module_gce.py\", line 748, in \n main()\n File \"/tmp/ansible_6Xe84b/ansible_module_gce.py\", line 695, in main\n module, gce, inames, number, lc_zone, state)\n File \"/tmp/ansible_6Xe84b/ansible_module_gce.py\", line 604, in change_instance_state\n gce.destroy_node(node)\n File \"/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py\", line 6428, in destroy_node\n self.connection.async_request(request, method='DELETE')\n File \"/usr/lib/python2.7/site-packages/libcloud/common/base.py\", line 787, in async_request\n (self.timeout))\nlibcloud.common.types.LibcloudError: <LibcloudError in None 'Job did not complete in 180 seconds'>\n", [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] "module_stdout": "", [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] "msg": "MODULE FAILURE", [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] "rc": 1 [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] } [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] to retry, use: --limit @/workDir/workspace/cloud-image-builder_PR-71-JQTBPZAX4DCIWKVPBIRIXXGVDKM26HY3KRKIPXGHE6LEQA2W3BDQ/gcp-test-centos-cleanup.retry [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] PLAY RECAP *** [gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df] localhost : ok=0 changed=0 unreachable=0 failed=1
[gcp-centos-a5e7fc18-d90e-4a3f-9362-479bbfeeb5df]

rwsu commented 6 years ago

First stopping the instance, then deleting it works around this issue. It is curious why a single task to setting "absent" as the instance state times out now.

rwsu commented 5 years ago

Fixed with https://github.com/kubevirt/cloud-image-builder/pull/75