Containerized Python based Framework for running and visualizing benchmark workloads on any Kubernetes/ OpenShift and runtime kinds pods, kata containers and kubevirt virtual machines simply and safely
Apache License 2.0
20
stars
19
forks
source link
Hot fix: Add retry mechanism for VM verification #913
Currently, there is no retry mechanism in place when VM verification fails due to a network issue.
I am adding a retry mechanism with a sleep interval between each iteration to ensure that the VM is unresponsive because it hasn't been scheduled on the dedicated node.
See the error below:
File "/usr/local/lib/python3.12/site-packages/benchmark_runner/workloads/bootstorm_vm.py", line 159, in _verify_single_vm
node_ip = self._oc.get_nodes_addresses()[vm_node]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'E1002 11:00:00.300037 256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.300494 256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.303979 256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.304439 256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.306895 256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.307315 256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nThe connection to the server api.bm.perfci.com:6443 was refused - did you specify the right host or port?'
For security reasons, all pull requests need to be approved first before running any automated CI
Type of change
Note: Fill x in []
Description
Currently, there is no retry mechanism in place when VM verification fails due to a network issue. I am adding a retry mechanism with a sleep interval between each iteration to ensure that the VM is unresponsive because it hasn't been scheduled on the dedicated node.
See the error below:
For security reasons, all pull requests need to be approved first before running any automated CI