Closed qiliRedHat closed 3 years ago
@mffiedler PTAL @wabouhamad @paigerube14
In my first test, after 33 minutes I got the following stack - looks like it was first time project deletion was run: 2021-06-11 20:36:10,038 - INFO - Project deleted : testuser-3-2 2021-06-11 20:36:10,042 - INFO - 0: project.project.openshift.io "testuser-4-1" deleted
2021-06-11 20:36:10,042 - INFO - Project deleted : testuser-4-1
Traceback (most recent call last):
File "/root/tmp/svt/reliability/reliability.py", line 28, in
Have not investigated yet.
In my first test, after 33 minutes I got the following stack - looks like it was first time project deletion was run: 2021-06-11 20:36:10,038 - INFO - Project deleted : testuser-3-2 2021-06-11 20:36:10,042 - INFO - 0: project.project.openshift.io "testuser-4-1" deleted
2021-06-11 20:36:10,042 - INFO - Project deleted : testuser-4-1 Traceback (most recent call last): File "/root/tmp/svt/reliability/reliability.py", line 28, in task_manager.start() File "/root/tmp/svt/reliability/tasks/TaskManager.py", line 134, in start task.execute() File "/root/tmp/svt/reliability/tasks/Task.py", line 254, in execute login_args.append((name, password, kubeconf)) NameError: name 'name' is not defined
Have not investigated yet.
This is not about project deletion, but about login action after project deletion. This is a bug introduced when I update the login part to make it to use the new login function in Session.py. I fixed it in commit https://github.com/openshift/svt/pull/670/commits/4e11837b0c9543650f53a6cfdb844bc6bf06c14e. Thanks!
@mffiedler Thank you for the first review. Please check my reply and comment above. After the review is done, I'll do a rebase to keep a single commit.
result = self.fn(*self.args, **self.kwargs)
File ".../svt/reliability/tasks/Task.py", line 243, in <lambda>
results = executor.map(lambda t: all_pods.check(*t), pod_check_args)
File ".../svt/reliability/tasks/Pods.py", line 10, in check
if user == "kubeadmin":
NameError: name 'user' is not defined
Qiujie, I am getting this error when running. I think it is coming from inside the check pods function if statement around line 240 that you are missing passing the user
After adding the user to that line I get the following issue. It seems that when checking for the pods it's not looking in the correct project, you're only looking in the default project. Probably just need to add "-n
2021-06-15 09:54:49,933 - INFO - =>oc get pods --kubeconfig /Users/prubenda/PycharmProjects/pplRepos/qili/svt/reliability/kubeconfigs/kubeconfig_kubeadmin| egrep -v "Running|Complete"
2021-06-15 09:54:49,933 - INFO - Pod check failed for namespace : kubeadmin-4
2021-06-15 09:54:50,355 - INFO - 1: No resources found in default namespace.
2021-06-15 09:54:50,356 - ERROR - get pods: failed
Think you just need to have line 11 match the same format as line 13
In that case do you need that if statement because they are the same line?
result = self.fn(*self.args, **self.kwargs) File ".../svt/reliability/tasks/Task.py", line 243, in <lambda> results = executor.map(lambda t: all_pods.check(*t), pod_check_args) File ".../svt/reliability/tasks/Pods.py", line 10, in check if user == "kubeadmin": NameError: name 'user' is not defined
Qiujie, I am getting this error when running. I think it is coming from inside the check pods function if statement around line 240 that you are missing passing the user
After adding the user to that line I get the following issue. It seems that when checking for the pods it's not looking in the correct project, you're only looking in the default project. Probably just need to add "-n
" to the line when you are getting the pods. 2021-06-15 09:54:49,933 - INFO - =>oc get pods --kubeconfig /Users/prubenda/PycharmProjects/pplRepos/qili/svt/reliability/kubeconfigs/kubeconfig_kubeadmin| egrep -v "Running|Complete" 2021-06-15 09:54:49,933 - INFO - Pod check failed for namespace : kubeadmin-4 2021-06-15 09:54:50,355 - INFO - 1: No resources found in default namespace. 2021-06-15 09:54:50,356 - ERROR - get pods: failed
Think you just need to have line 11 match the same format as line 13
In that case do you need that if statement because they are the same line?
Thank you for catching this! Regarding the if, my intention was to let the admin persona user to show all pods in all namespaces, that's more like a real admin wants to do. You make me realized I need to add '-A' to show all namespaces's pods.
@qiliRedHat Can you squash the commits and then @paigerube14 and I can do final review/test? Thanks.
@qiliRedHat Can you squash the commits and then @paigerube14 and I can do final review/test? Thanks.
@mffiedler @paigerube14, I squashed the commits.
One last change and I think this is good to go. All users, especially kubeadmin, have to login once every 24 hours real time to keep their sessions alive. Running over a weekend I saw admin oc commands start to fail with:
2021-06-21 02:00:44,373 - INFO - =>oc get clusteroperators --kubeconfig /root/tmp/svt2/reliability/kubeconfigs/kubeconfig_kubeadmin -o yaml 2021-06-21 02:00:44,643 - INFO - 1: apiVersion: v1 items: [] kind: List metadata: resourceVersion: "" selfLink: "" error: You must be logged in to the server (Unauthorized)
2021-06-21 02:00:44,643 - ERROR - get clusteroperators: failed 2021-06-21 02:01:14,673 - INFO - =>oc get pods -A --kubeconfig /root/tmp/svt2/reliability/kubeconfigs/kubeconfig_kubeadmin| egrep -v "Running|Complete" 2021-06-21 02:01:14,955 - INFO - 1: error: You must be logged in to the server (Unauthorized)
In the past this was handled by making the admin user login in a task, but we could make it part of the test infrastructure as well. @qiliRedHat Let me know if you prefer to handle this as a separate issue after merging this PR. I am ok with that too
@mffiedler Great thanks to let me know this expiration. https://github.com/openshift/svt/pull/670/commits/86059b3ed081620e8898a23621a02ed47adabc25 added the mechanism, to do re login in for all users before a task run if it's been 23 hours since last login. I tested it with shorter time interval - 10 minutes, it works well.
excellent. Let me do one final test for 24+ hours and then I will plan on merging tomorrow.
@mffiedler Thanks you for the test and bug caching! I squashed.
Reran yesterday, not for as long as Mike's test, things looked good to me!
@paigerube14 Thank you for your time reviewing this pr and the bug you found!
https://issues.redhat.com/browse/OCPQE-4221 Reliability Test Tool Enhancement Plan
Concurrent task run with multiple users - priority 0-1
What Today the test execution is sequentially with default user system:admin and then as the user specified in the "login" action. This task is to add concurrency for each type of user added in phase 1. X admin/developer users can run m tasks(oc commands) concurrently. (currently only one admin user in config yaml structure, can restructure in the future) Y end users can visit n applications concurrently
Why
How
According to the setting of task in config file, choose 'concurrency' number of users for each 'persona' (admin, developer, user). With the config below, 'testuser-0' and 'testuser-1' will execute the project create operation concurrently, while 10 end users will visit all (100%) applications - every application will be visited by 10 users concurrently.
Modify Task.py, for each type of action, use multiple threads to run the tasks with oc or shell commands.
Modify functions Projects.py, Apps.py, and Pods.py to accept 'kubeconfig' parameter. Add -n namespace to a command if needed, such as getting pod - a dev user can only list pods under its own namespace. For operations that can only be run by admin, such as labeling namespace, using admin user's kubeconfig.
Add global locks to protect the shared mutable counters, like the project id, total project number, app number, build number.
The '+=' operations are not thread safe. Is adding and poping a dict's entry in a shared map with multiple thread thread safe? - I think, yes. https://docs.python.org/3/glossary.html#term-global-interpreter-lock https://docs.python.org/3/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
Handle the situation caused by ansync operations. For example, an app is chosen to be upgraded, while the project deletion (--wait=false is async) actually done after that.
For app visit, the current solution is visiting sequentially. Implement concurrency with corutines (asyncio|https://github.com/timofurrer/awesome-asyncio/blob/master/README.md + aiohttp. Corutines can support high concurrency for persona user. Concurrency is better than multi threading as it is single-threaded, multiplexing I/O access, thus cost is lower. Add a new LoadApp module to load apps, use this module to count the application visit succeeded and failed numbers. The visit function is Apps module is kept for verify application route availability after app creation. In this way, the app visit succeeded and failed numbers only count the visits after route is available, 503 before available will not be counted. This can better reflect the user persona's concern - app visiting after it is ready. example of an app visit task config. This config means 10 persona user will visit all applications concurrently.
Exit Criteria Config with all kinds of tasks can be run with multiple users in parallel.
python reliability.py -f config/enhance_reliability.yaml