Closed chandrams closed 2 months ago
Hitting 504 Gateway Time-out issues in some of the clients in the 100k scalability run
Current status: 14 days of results uploaded completed, 15th day in progress
kubectl exec -it `kubectl get pods -o=name -n openshift-tuning | grep postgres` -n openshift-tuning -- psql -U admin -d kruizeDB -c "SELECT count(*) from public.kruize_experiments ;"; kubectl exec -it `kubectl get pods -o=name -n openshift-tuning | grep postgres` -n openshift-tuning -- psql -U admin -d kruizeDB -c "SELECT count(*) from public.kruize_results ;" ; kubectl exec -it `kubectl get pods -o=name -n openshift-tuning | grep postgres` -n openshift-tuning -- psql -U admin -d kruizeDB -c "SELECT count(*) from public.kruize_recommendations;"; kubectl exec -it `kubectl get pods -o=name -n openshift-tuning | grep postgres` -n openshift-tuning -- psql -U admin -d kruizeDB -c "SELECT pg_size_pretty( pg_database_size('kruizeDB') );";
count
--------
100000
(1 row)
count
-----------
141001968
(1 row)
count
---------
5875136
(1 row)
pg_size_pretty
----------------
418 GB
(1 row)
@msvinaykumar - I have captured the resource usage with box plots preview for 100k exps in the table in the description, please review
cc : @dinogun @rbadagandi @chandrams
With box plots, I observe a 20-hour increase in execution time, a 10GB rise in Kuize memory, and surprisingly, not much impact on DB size. @chandrams Could you please run 'listRecommendations' and check the available plot data just in case, to double-check?
@msvinaykumar - 10GB increase is postgres DB.
Run was done long back, so can't check the plot data.
What's the latest on this and its impact on execution time? @dinogun @chandrams
Short scalability run 5k /15 days execution time is around 3 hrs 51 mins with Kruize release 0.0.22_mvp with resources set
Kruize - mem req - 4 Gi, mem limit - 8Gi Postgres- mem req - 10 Gi, mem limit - 30 Gi
We have created a JIRA (https://issues.redhat.com/browse/KRUIZE-149) to investigate the 100k execution time increase of 24 hrs with 0.0.20.3_mvp (Box plots preview), closing this issue.
Scalability testing with kruize build kruize/autotune_operator:0.0.20.3_mvp:
Short Scalability run - 5K exps / 15 days of results / 2 containers per exp Kruize replicas - 10 OCP - Scalelab cluster
Long Scalability run - 100K exps / 15 days of res / 2 containers per exp Kruize replicas - 10 OCP - AWS cluster