kruize / autotune

Autonomous Performance Tuning for Kubernetes!
Apache License 2.0
163 stars 54 forks source link

Either container / namespace based queries to be run based on the experiment_type in local monitoring #1308

Closed shreyabiradar07 closed 2 days ago

shreyabiradar07 commented 2 weeks ago

Describe the bug

With the latest image quay.io/kruize/autotune_operator:0.0.25_mvp, both container and namespace queries are being executed while trying to generate recommendations for a container based experiment.

It is expected to run only the relevant queries to not overload the datasource while running multiple experiments.

How to reproduce it Run local monitoring kruize-demos and check for pod logs after completion ./local_monitoring_demo.sh -c openshift -i quay.io/kruize/autotune_operator:0.0.25_mvp

Expected behavior Queries related to experiment_type : container should be run, excluding the namespace queries

Relevant logs

2024-09-2412:12:33.494 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28container_memory_working_set_bytes%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.541 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-avg_over_time(sum by(namespace) (container_memory_rss{namespace="default", container!='', container!='POD', pod!=''})[15m:])
2024-09-2412:12:33.541 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=avg_over_time%28sum+by%28namespace%29+%28container_memory_rss%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.542 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=avg_over_time%28sum+by%28namespace%29+%28container_memory_rss%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.591 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-min_over_time(sum by(namespace) (container_memory_rss{namespace="default", container!='', container!='POD', pod!=''})[15m:])
2024-09-2412:12:33.591 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=min_over_time%28sum+by%28namespace%29+%28container_memory_rss%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.592 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=min_over_time%28sum+by%28namespace%29+%28container_memory_rss%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.640 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-max_over_time(sum by(namespace) (container_memory_rss{namespace="default", container!='', container!='POD', pod!=''})[15m:])
2024-09-2412:12:33.640 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28container_memory_rss%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.641 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28container_memory_rss%7Bnamespace%3D%22default%22%2C+container%21%3D%27%27%2C+container%21%3D%27POD%27%2C+pod%21%3D%27%27%7D%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.702 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-avg_over_time(sum by(namespace) ((kube_pod_info{namespace="default"}))[15m:])
2024-09-2412:12:33.703 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=avg_over_time%28sum+by%28namespace%29+%28%28kube_pod_info%7Bnamespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.703 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=avg_over_time%28sum+by%28namespace%29+%28%28kube_pod_info%7Bnamespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.724 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-max_over_time(sum by(namespace) ((kube_pod_info{namespace="default"}))[15m:])
2024-09-2412:12:33.724 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28%28kube_pod_info%7Bnamespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.725 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28%28kube_pod_info%7Bnamespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.755 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-avg_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running", namespace="default"}))[15m:])
2024-09-2412:12:33.755 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=avg_over_time%28sum+by%28namespace%29+%28%28kube_pod_status_phase%7Bphase%3D%22Running%22%2C+namespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.756 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=avg_over_time%28sum+by%28namespace%29+%28%28kube_pod_status_phase%7Bphase%3D%22Running%22%2C+namespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.787 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-max_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running", namespace="default"}))[15m:])
2024-09-2412:12:33.787 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28%28kube_pod_status_phase%7Bphase%3D%22Running%22%2C+namespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.789 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max_over_time%28sum+by%28namespace%29+%28%28kube_pod_status_phase%7Bphase%3D%22Running%22%2C+namespace%3D%22default%22%7D%29%29%5B15m%3A%5D%29&start=1725883952&end=1727179952&step=900 HTTP/1.1
2024-09-2412:12:33.817 INFO [qtp62343880-53][RecommendationEngine.java(2095)]-max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace="default"})) > 0 )[15d:]))
2024-09-2412:12:33.818 INFO [qtp62343880-53][RecommendationEngine.java(2104)]-http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max%28last_over_time%28timestamp%28%28sum+by+%28namespace%29+%28container_cpu_usage_seconds_total%7Bnamespace%3D%22default%22%7D%29%29+%3E+0+%29%5B15d%3A%5D%29%29&start=1725883952&end=1727179952&step=900
2024-09-2412:12:33.819 INFO [qtp62343880-53][GenericRestApiClient.java(96)]-Executing request: GET http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query_range?query=max%28last_over_time%28timestamp%28%28sum+by+%28namespace%29+%28container_cpu_usage_seconds_total%7Bnamespace%3D%22default%22%7D%29%29+%3E+0+%29%5B15d%3A%5D%29%29&start=1725883952&end=1727179952&step=900 HTTP/1.1

Environment:

Additional context .

shreyabiradar07 commented 2 days ago

Closing the issue as PR #1309 is merged