Open achuzhoy opened 1 year ago
Same behavior reproduced with killing etcd:
` chaos_scenarios: # List of policies/chaos scenarios to load
`
` scenarios:
python3.9 run_kraken.py --config config/kill-etcd.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _
| |/ / \ ' \
| <| | | (_| | < / | | |
||__| _,||__|| ||
2023-05-25 12:23:02,066 [INFO] Starting kraken
2023-05-25 12:23:02,075 [INFO] Initializing client to talk to the Kubernetes cluster
2023-05-25 12:23:05,649 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 12:23:05,649 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 12:23:05,650 [INFO] Starting http server at http://0.0.0.0:8085
2023-05-25 12:23:05,650 [INFO] Fetching cluster info
2023-05-25 12:23:05,658 [INFO] Cluster version is 4.13.0
2023-05-25 12:23:05,659 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443
2023-05-25 12:23:05,659 [INFO] Generated a uuid for the run: 77d465f6-2149-4233-b9f7-4642e84dffb0
2023-05-25 12:23:05,659 [INFO] Daemon mode not enabled, will run through 1 iterations
2023-05-25 12:23:05,659 [INFO] Executing scenarios for iteration 0
2023-05-25 12:23:05,659 [INFO] connection set up
127.0.0.1 - - [25/May/2023 12:23:05] "GET / HTTP/1.1" 200 -
2023-05-25 12:23:05,660 [INFO] response RUN
2023-05-25 12:23:05,660 [INFO] Running container scenarios
2023-05-25 12:23:08,343 [INFO] Killing container etcd in pod etcd-master-1-2 (ns openshift-etcd)
2023-05-25 12:23:08,466 [INFO] Killing container etcd in pod etcd-master-1-1 (ns openshift-etcd)
2023-05-25 12:23:08,657 [INFO] Scenario kill etcd container successfully injected
Traceback (most recent call last):
File "/root/krkn/krkn/run_kraken.py", line 421, in
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'
`
How to reproduce: config.yaml shold have this scenario ` chaos_scenarios: # List of policies/chaos scenarios to load
The content of the scenario file: ` scenarios:
python3.9 run_kraken.py --config config/kill-api.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _
| |/ / \ ' \| <| | | (_| | < / | | |
||__| _,||__|| ||
2023-05-25 11:58:39,485 [INFO] Starting kraken
2023-05-25 11:58:39,495 [INFO] Initializing client to talk to the Kubernetes cluster 2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085 2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085 2023-05-25 11:58:42,999 [INFO] Starting http server at http://0.0.0.0:8085
2023-05-25 11:58:43,000 [INFO] Fetching cluster info
2023-05-25 11:58:43,008 [INFO] Cluster version is 4.13.0
2023-05-25 11:58:43,008 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443 2023-05-25 11:58:43,008 [INFO] Generated a uuid for the run: a713f10c-8b26-4b2c-8a81-8356cff6ef58 2023-05-25 11:58:43,008 [INFO] Daemon mode not enabled, will run through 1 iterations
2023-05-25 11:58:43,009 [INFO] Executing scenarios for iteration 0
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'
2023-05-25 11:58:43,009 [INFO] connection set up
127.0.0.1 - - [25/May/2023 11:58:43] "GET / HTTP/1.1" 200 -
2023-05-25 11:58:43,010 [INFO] response RUN
2023-05-25 11:58:43,010 [INFO] Running container scenarios
2023-05-25 11:58:44,823 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-hmpsj (ns openshift-apiserver) 2023-05-25 11:58:44,959 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-cd7bv (ns openshift-apiserver) 2023-05-25 11:58:45,071 [INFO] Scenario kill apiserver container successfully injected Traceback (most recent call last): File "/root/krkn/krkn/run_kraken.py", line 421, in
`
The issue reproduced with count set to 3 The issue didn't reproduce with count set to 1.
Note that the cluster has 3 pods.
When the same was attempted against SNO (with a single api pod), the following error was thrown:
2023-05-25 12:06:17,950 [INFO] Killing container openshift-apiserver in pod apiserver-6b77769b8-6j4gg (ns openshift-apiserver) 2023-05-25 12:06:18,083 [ERROR] Trying to kill more containers than were found, try lowering kill count 2023-05-25 12:06:18,083 [ERROR] Scenario kill apiserver container failed
In this case it's an expected error.