Open chandrams opened 1 week ago
Commented out test_list_recommendations_cpu_mem_optimised test that failed with 502 error and running the sanity testsuite manually, all the tests passed now. Will run the entire testsuite and check again.
I see an error occurring while creating the experiment. It could be related to the state, such as whether Kruize and its related pods, including the database service, are ready to handle the request.
Yes, that create experiment issue failed due to 502 error in this job, hence commented the below test & other tests work fine.
We need to check why 502 occurs when we run the entire sanity bucket.
Commented out test_list_recommendations_cpu_mem_optimised test that failed with 502 error and running the sanity testsuite manually, all the tests passed now. Will run the entire testsuite and check again.
Two new tests failed now, after commenting the above test and running the entire functional testsuite manually, due to 502 error response from list recommendations:
Listing the recommendations...
URL = http://kruize-openshift-tuning.apps.kruize-scalelab.h0b5.p1.openshiftapps.com/listRecommendations
PARAMS = {'experiment_name': 'quarkus-resteasy-kruize-min-http-response-time-db_0'}
Response status code = 502
************************************************************
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
.
.
FAILED test_list_recommendations.py::test_list_recommendations_for_diff_reco_terms_with_only_latest[long_term_test_true-15-reco_json_schema4-360.0-True-False]
FAILED test_list_recommendations.py::test_list_recommendations_for_diff_reco_terms_with_only_latest[long_term_test_false-15-reco_json_schema5-360.0-False-False]
========== 17 failed, 10 passed, 334 deselected in 2597.64s (0:43:17) ==========
########### Results Summary of the test suite remote_monitoring_tests ##########
remote_monitoring_tests took 2988 seconds
Number of tests performed 358
Number of tests passed 313
Number of tests failed 45
~~~~~~~~~~~~~~~~~~~~~~~ remote_monitoring_tests failed ~~~~~~~~~~~~~~~~~~~~~~~~~~
Failed cases are :
negative
extended
Executed the test suite again the above 2 failures are not seen, 502 error issue seems to be intermittent
########### Results Summary of the test suite remote_monitoring_tests ##########
remote_monitoring_tests took 5843 seconds
Number of tests performed 358
Number of tests passed 315
Number of tests failed 43
~~~~~~~~~~~~~~~~~~~~~~~ remote_monitoring_tests failed ~~~~~~~~~~~~~~~~~~~~~~~~~~
Failed cases are :
negative
extended
Check Log Directory: /home/jenkins/test_res_alltests_0.0.24_skip_cpu_mem_optimized/kruize_test_results/kruize_20240904:07:45:37/remote_monitoring_tests for failed cases
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
************************************** done *************************************
*********************************************************************************
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Overall summary of the tests ~~~~~~~~~~~~~~~~~~~~~~~
Total time taken to perform the test 5843 seconds
Total Number of test suites performed 1
Total Number of tests performed 358
Total Number of tests passed 315
Total Number of tests failed 43
These 43 failures are due to known issues.
Executed only the sanity bucket by enabling the skipped test - test_list_recommendations_cpu_mem_optimised test, it passed, didn't see the 502 error.
########### Results Summary of the test suite remote_monitoring_tests ##########
remote_monitoring_tests took 2051 seconds
Number of tests performed 155
Number of tests passed 155
Number of tests failed 0
~~~~~~~~~~~~~~~~~~~~~~ remote_monitoring_tests passed ~~~~~~~~~~~~~~~~~~~~~~~~~~
************************************** done *************************************
Logs of another sanity run that failed with kruize pod restart test_res_sanity_functional_0.0.24.zip
I have run one of the failing tests alone with the below builds, here are the results:
pytest -s test_list_recommendations.py::test_list_recommendations_cpu_mem_optimised --cluster_type openshift
Executed this test 5 times:
With 0.0.22_mvp, did not see the failure (could be very intermittent though I did not see the failure in 5 runs) With 0.0.23_mvp, test failed 2 out of 5 runs With 0.0.24_mvp, test failed 2 out of 5 runs
Note: When the test fails kruize pod is restarted
@msvinaykumar @khansaad - Can you please take a look at this issue.
Describe the bug Kruize remote monitoring functional tests are failing with different issues on openshift with latest kruize 0.0.24_mvp image
https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/Autotune/job/kruize_release_tests/139/ - Kruize scalelab https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/Autotune/job/kruize_functional_tests/128/ - kruize scalelab
https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/Autotune/job/kruize_functional_tests/128/testReport/junit/rest_apis/test_list_recommendations/test_list_recommendations_profile_notifications_cpu_zero_test_1_True_update_metrics0_323002_CPU_usage_is_zero__No_CPU_Recommendations_can_be_generated_/
https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/Autotune/job/kruize_functional_tests/128/testReport/junit/rest_apis/test_list_recommendations/test_list_recommendations_cpu_mem_optimised/