redhat-performance / benchmark-runner

Containerized Python based Framework for running and visualizing benchmark workloads on any Kubernetes/ OpenShift and runtime kinds pods, kata containers and kubevirt virtual machines simply and safely
Apache License 2.0
20 stars 19 forks source link

Hot fix: Add retry mechanism for VM verification #913

Closed ebattat closed 1 month ago

ebattat commented 1 month ago

Type of change

Note: Fill x in []

Description

Currently, there is no retry mechanism in place when VM verification fails due to a network issue. I am adding a retry mechanism with a sleep interval between each iteration to ensure that the VM is unresponsive because it hasn't been scheduled on the dedicated node.

See the error below:

  File "/usr/local/lib/python3.12/site-packages/benchmark_runner/workloads/bootstorm_vm.py", line 159, in _verify_single_vm

    node_ip = self._oc.get_nodes_addresses()[vm_node]

              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^

KeyError: 'E1002 11:00:00.300037  256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.300494  256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.303979  256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.304439  256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.306895  256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nE1002 11:00:00.307315  256719 memcache.go:265] couldn\'t get current server API group list: Get "https://api.bm.perfci.com:6443/api?timeout=32s": dial tcp 198.18.10.3:6443: connect: connection refused\nThe connection to the server api.bm.perfci.com:6443 was refused - did you specify the right host or port?'

For security reasons, all pull requests need to be approved first before running any automated CI