robusta-dev / krr

Prometheus-based Kubernetes Resource Recommendations
MIT License
2.51k stars 140 forks source link

Error missing label of app on certain deployment #264

Open namanjain98 opened 2 months ago

namanjain98 commented 2 months ago

Describe the bug i have deployed KRR in my kubernetes cluster while running a simple test getting error which says the label of app on certain deployment is missing not able to to find which deployment is that

Attaching the error message

also i am receiving this error ERROR An unexpected error occurred runner.py:332 Traceback (most recent call last):
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 325, in run
result = await self._collect_result()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 279, in _collect_result
scans = await asyncio.gather([self._gather_object_allocations(k8s_object) for k8s_object in workloads])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 243, in _gather_object_allocations
recommendation = await self._calculate_object_recommendations(k8s_object)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 177, in _calculate_object_recommendations
object.pods = await self._k8s_loader.load_pods(object)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/integrations/kubernetes/init.py", line 543, in load_pods
return await cluster_loader.list_pods(object)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/integrations/kubernetes/init.py", line 118, in list_pods
ret: V1PodList = await loop.run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(
self.args, self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/integrations/kubernetes/init.py", line 120, in
lambda: self.core.list_namespaced_pod(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 15697, in list_namespaced_pod
return self.list_namespaced_pod_with_http_info(namespace,
kwargs) # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 15812, in list_namespaced_pod_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/rest.py", line 241, in GET
return self.request("GET", url,
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/rest.py", line 235, in request
raise ApiException(httpresp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6c010d84-8974-4781-84cb-fd7d95a65e45', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'c6fecb2f-6615-45e0-8e84-073e86bb3e81',
'X-Kubernetes-Pf-Prioritylevel-Uid': '6e55d744-0422-4e52-a4ce-de0d56fbdd33', 'Date': 'Wed, 10 Apr 2024 08:53:18 GMT', 'Content-Length': '465'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to parse requirement: values[0][matchLabels]: Invalid value: \"{'app':\": a valid label must be an empty string or consist of
alphanumeric characters, '-', '
' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'myvalue', or '12345', regex used for validation is
'(([A-Za-z0-9][-A-Za-z0-9
.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

As the error shows that there is some isue with the deployment labels but which deployment it is not showing since i have 100s of deployment running in my cluster i am not able to find the right one can please help checking this

aantn commented 2 months ago

Hey, if you run krr with --verbose does that help figure it out?

aantn commented 3 weeks ago

Hey, would it be possible to provide more information (e.g. the app label from the problematic pod) to help us fix this?

Without more information, we're going to close until we can replicate.

anil-repos commented 2 weeks ago

getting similar error :

 ERROR    An unexpected error occurred                                                                                  runner.py:332
                    Traceback (most recent call last):
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 325, in run
                        result = await self._collect_result()
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 279, in
                    _collect_result
                        scans = await asyncio.gather(*[self._gather_object_allocations(k8s_object) for k8s_object in workloads])
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 243, in
                    _gather_object_allocations
                        recommendation = await self._calculate_object_recommendations(k8s_object)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 177, in
                    _calculate_object_recommendations
                        object.pods = await self._k8s_loader.load_pods(object)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\integrations\kubernetes\__init__.py",
                    line 545, in load_pods
                        return await cluster_loader.list_pods(object)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\integrations\kubernetes\__init__.py",
                    line 119, in list_pods
                        ret: V1PodList = await loop.run_in_executor(
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Program
                    Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\concurrent\futures\
                    thread.py", line 58, in run
                        result = self.fn(*self.args, **self.kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\integrations\kubernetes\__init__.py",
                    line 121, in <lambda>
                        lambda: self.core.list_namespaced_pod(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api\core_v1_api.py", line 15697, in list_namespaced_pod
                        return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api\core_v1_api.py", line 15812, in
                    list_namespaced_pod_with_http_info
                        return self.api_client.call_api(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api_client.py", line 348, in call_api
                        return self.__call_api(resource_path, method,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api_client.py", line 180, in __call_api
                        response_data = self.request(
                                        ^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api_client.py", line 373, in request
                        return self.rest_client.GET(url,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\rest.py", line 241, in GET
                        return self.request("GET", url,
                               ^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\rest.py", line 235, in request
                        raise ApiException(http_resp=r)
                    kubernetes.client.exceptions.ApiException: (400)
                    Reason: Bad Request
                    HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c0fbfc29-134c-4255-b585-fa66e718eb2a', 'Cache-Control':
                    'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':
                    '8cd1bb38-a9b2-4817-9dde-b2ef4e71ad9e', 'X-Kubernetes-Pf-Prioritylevel-Uid':
                    'e1fbf700-9281-4251-b87c-603e091edec1', 'Date': 'Fri, 14 Jun 2024 07:32:44 GMT', 'Content-Length': '465'})
                    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to
                    parse requirement: values[0][matchLabels]: Invalid value: \"{'app':\": a valid label must be an empty string
                    or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character
                    (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is
                    '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

Getting this error for almost all k8s resource jobs, cronjobs, daemonsets, statefulsets, rollout and deployment. Checked via one-by-one commenting k8s resources from line 64-70 of this file ..\krr\robusta_krr\core\integrations\kubernetes__init__.py

The label on these resource is as simple as

app: my-application
aantn commented 2 weeks ago

What is the full CLI command that you ran krr with? Are you passing a label selector?

anil-repos commented 2 weeks ago

I am executing this command

python krr.py simple --verbose

similar error with brew. Also i am not passing any label/selector.

aantn commented 2 weeks ago

Does this also occur on the branch prometheus-workload-loader? If so, does it occur on that branch if you run krr with --mode prometheus?

aantn commented 2 weeks ago

And if that still does not solve the problem, please try the branch debug-build-anil-repos and share logs lines starting with Listing pods for namespace=.

We aren't able to replicate ourselves, but with your help I hope that we can get to the bottom of this!

anil-repos commented 2 weeks ago

Hi @aantn Realized, running krr against particular namespace is not throwing any error I checked on both main and prometheus-workload-loader branch

python krr.py simple --namespace myns

Thanks for your help !

aantn commented 2 weeks ago

Any chance you can still run debug-build-anil-repos with verbose logging and see if you spot the problem? I assume the bug still exists so it would be great to solve it.

headyj commented 1 week ago

@aantn I just had the same issue on v1.11.0. I tried to run the following command on both prometheus-workload-loader and debug-build-anil-repo branches and had the same error on both (with Python 3.12.3):

python krr.py simple --logtostderr \
 -f json \
 --history_duration 720 \
 --allow-hpa \
 --verbose
ERROR    An unexpected error occurred                                                         runner.py:349
Traceback (most recent call last):                                                                
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
342, in run                                                                                       
    result = await self._collect_result()                                                         
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                         
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
297, in _collect_result                                                                           
    scans = await asyncio.gather(*[self._gather_object_allocations(k8s_object) for                
k8s_object in workloads])                                                                         
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              
^^^^^^^^^^^^^^^^^^^^^^^^                                                                          
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
236, in _gather_object_allocations                                                                
    recommendation = await self._calculate_object_recommendations(k8s_object)                     
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                     
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
156, in _calculate_object_recommendations                                                         
    object.pods = await cluster_loader.load_pods(object)                                          
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                          
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/integrations/kub              
ernetes/cluster_loader/__init__.py", line 160, in load_pods                                       
    return await self._workload_loaders[object.kind].list_pods(object)                            
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                            
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/integrations/kub              
ernetes/cluster_loader/loaders/base.py", line 79, in list_pods                                    
    ret: V1PodList = await loop.run_in_executor(                                                  
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                  
    File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run                        
    result = self.fn(*self.args, **self.kwargs)                                                   
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                   
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/integrations/kub              
ernetes/cluster_loader/loaders/base.py", line 81, in <lambda>                                     
    lambda: self.core.list_namespaced_pod(                                                        
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                        
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py"              
, line 15697, in list_namespaced_pod                                                              
    return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa:                  
E501                                                                                              
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py"              
, line 15812, in list_namespaced_pod_with_http_info                                               
    return self.api_client.call_api(                                                              
            ^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py",                  
line 348, in call_api                                                                             
    return self.__call_api(resource_path, method,                                                 
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                 
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py",                  
line 180, in __call_api                                                                           
    response_data = self.request(                                                                 
                    ^^^^^^^^^^^^^                                                                 
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py",                  
line 373, in request                                                                              
    return self.rest_client.GET(url,                                                              
            ^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
    File "/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/rest.py",                 
line 241, in GET                                                                                  
    return self.request("GET", url,                                                               
            ^^^^^^^^^^^^^^^^^^^^^^^^                                                               
    File "/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/rest.py",                 
line 235, in request                                                                              
    raise ApiException(http_resp=r)                                                               
kubernetes.client.exceptions.ApiException: (400)                                                  
Reason: Bad Request                                                                               
HTTP response headers: HTTPHeaderDict({'Audit-Id':                                                
'16d946e1-6076-4479-b7e5-7e38fb6711bb', 'Cache-Control': 'no-cache, private',                     
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':                             
'3669e6fd-765f-49b2-80b6-c36fbf14ed0b', 'X-Kubernetes-Pf-Prioritylevel-Uid':                      
'3bc3f8ff-716f-4dc2-b998-101d11246242', 'Date': 'Thu, 27 Jun 2024 08:47:19 GMT',                  
'Content-Length': '489'})                                                                         
HTTP response body:                                                                               
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unabl              
e to parse requirement: values[0][matchLabels]: Invalid value:                                    
\"{'app.kubernetes.io/component':\": a valid label must be an empty string or                     
consist of alphanumeric characters, '-', '_' or '.', and must start and end with an               
alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for               
validation is                                                                                     
'(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}
aantn commented 1 week ago

Thanks, on the branch debug-build-anil-repos do you have any lines of output starting with Listing pods for namespace.

The contents of that log will help us figure out what the issue is.

headyj commented 1 week ago

Yep, I do (sorry I had to anonymise most of the names :-( ):

[14:55:09] INFO     Listing pods for namespace=namespace-1 and label_selector=app.kubernetes.io/instance=namespace-1,app.kubernetes.io/name=my-helm,name=namespace-1-pod-1                                                     __init__.py:118
[14:55:10] DEBUG    Gathering PercentileCPULoader metric for StatefulSet namespace-1/namespace-1-pod-1/my-helm-pod-1                                                                                        prometheus_metrics_service.py:191
[14:55:13] INFO     Listing pods for namespace=namespace-2 and label_selector=batch.kubernetes.io/controller-uid=b77960f2-6231-41c6-a538-7cb0ce0ae219                                                                                                               __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Job namespace-2/namespace-2-pod-3-1704989752/pod-3                                                                                                                     prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-3 and label_selector=batch.kubernetes.io/controller-uid=6a2b4574-60c2-457d-9c46-3d44c5a89073                                                                                                    __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Job namespace-3/namespace-3-company-update/update                                                                                                                     prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-4 and label_selector=batch.kubernetes.io/controller-uid=4667ac78-0548-4ff7-b16d-f07907a2c046                                                                                                   __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Job namespace-4/namespace-4-pod-3-1705047673/pod-3                                                                                             prometheus_metrics_service.py:191
[14:56:26] INFO     Listing pods for namespace=namespace-5 and label_selector=app.kubernetes.io/component=backend,app.kubernetes.io/instance=namespace-5,app.kubernetes.io/name=company,name=namespace-5-backend                                                  __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-5/namespace-5-backend/backend                                                                                                                               prometheus_metrics_service.py:191
[14:56:27] INFO     Listing pods for namespace=namespace-1 and label_selector=app.kubernetes.io/instance=namespace-1,app.kubernetes.io/name=my-helm,name=namespace-1-nginx                                                            __init__.py:118
           INFO     Listing pods for namespace=namespace-1 and label_selector=app.kubernetes.io/instance=namespace-1,app.kubernetes.io/name=my-helm,name=pod-2                                                                          __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-1/namespace-1-nginx/my-helm-nginx                                                                                                       prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-1/pod-2/my-helm-pod-2                                                                                                                 prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-6 and label_selector=app.kubernetes.io/instance=namespace-6,app.kubernetes.io/name=my-helm,name=pod-2                                                                        __init__.py:118
           INFO     Listing pods for namespace=namespace-6 and label_selector=app.kubernetes.io/instance=namespace-6,app.kubernetes.io/name=my-helm,name=namespace-6-nginx                                                         __init__.py:118
[14:56:28] DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-6/namespace-6-nginx/my-helm-nginx                                                                                                     prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-6/pod-2/my-helm-pod-2                                                                                                                prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-7 and label_selector=app.kubernetes.io/component=backend,app.kubernetes.io/instance=namespace-7,app.kubernetes.io/name=company,name=namespace-7-pod-4                                     __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-7/namespace-7-pod-4/pod-4                                                                                                             prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=prometheus and label_selector=app=prometheus,app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=pushprox,component=pushprox,release=prometheus                                                       __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment prometheus/prometheus-pushprox/pushprox                                                                                                                             prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-8 and label_selector=app=namespace-8                                                                                                                                               __init__.py:118
           INFO     Listing pods for namespace=namespace-8 and label_selector=app=curl                                                                                                                                                               __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-8/namespace-8/namespace-8                                                                                                      prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-8/curl/curl                                                                                                                                      prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-9 and label_selector=matchLabels={'app.kubernetes.io/component': 'backend', 'app.kubernetes.io/instance': 'namespace-9', 'app.kubernetes.io/name': 'company', 'name':                         __init__.py:118
                    'namespace-9-pod-5'}

What I can also tell you, which maybe is related, is that even if the JSON export is working if I select a subset of namespaces, the generated JSON is usually not valid. Somewhere on the JSON (depending on which namespace I execute) I always have this description, which breaks the JSON validity:

[...]
"description": "Simple Strategy\n\nCPU request: 95.0% percentile, limit: unset\nMemory request: max + 15.0%, limit: max + 15.0%\nHistory: 720.0 hours\nStep: 1.25 minutes\n\nAll parameters can be customized. For example: `krr simple --cpu_percentile=90 
--memory_buffer_percentage=15 --history_duration=24 --timeframe_duration=0.5`\n\nLearn more: https://github.com/robusta-dev/krr#algorithm",
  "strategy": {
    "name": "simple",
    "settings": {
      "history_duration": 720.0,
      "timeframe_duration": 1.25,
      "cpu_percentile": 95.0,
      "memory_buffer_percentage": 15.0,
      "points_required": 100,
      "allow_hpa": true,
      "use_oomkill_data": false,
      "oom_memory_buffer_percentage": 25.0
    }
  },
[....]

This is not limited to any of the branch above. All of them have the same issue as far as I tested.

aantn commented 1 week ago

Thank you. Did you include all the matching log lines or only some of them? I am particularly interested in the last log line before the exception.

I am trying to figure out which listing of pods had an invalid app.kubernetes.io/component value and why. From your original log:

\"{'app.kubernetes.io/component':\": a valid label must be an empty string or                     
consist of alphanumeric characters, '-', '_' or '.', and must start and end with an               
alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for               
validation is                                                                                     
'(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

The mystery is what was the label value that broke things and how it is possible.

Regarding the JSON export, can you please open a separate ticket?

headyj commented 1 week ago

Did you include all the matching log lines or only some of them? I am particularly interested in the last log line before the exception.

Actually I missed only one, which is the last one before the exception:

INFO     Listing pods for namespace=my-project and label_selector=matchLabels={'app.kubernetes.io/component': 'backend', 'app.kubernetes.io/instance': 'my-project', 'app.kubernetes.io/name': 'company', 'name':                         __init__.py:118
                    'my-project-gateway'}                                                                                                                                                                                                                               
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet db-4-lts/core-2/db                                                                                                                                           prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet db-4-lts/core-3/db                                                                                                                                           prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet prometheus/prometheus-alertmanager/alertmanager                                                                                                                    prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet postgres/postgres-postgresql/postgresql                                                                                                                  prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet db-standalone-4-lts-ci/db-standalone-4-lts-ci/db                                                                                                          prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet rabbitmq/rabbitmq-server/rabbitmq                                                                                                                                  prometheus_metrics_service.py:191
           ERROR    An unexpected error occurred

And then comes the exception posted above.

Regarding the JSON export, can you please open a separate ticket?

Yep, I will

aantn commented 1 week ago

Thank you, we are very close to fixing this. I've narrowed it down to the problematic code, but I am still unable to reproduce myself. What is the kind of the Kubernetes workload (e.g. Deployment, StatefulSet) and what are the contents of spec.matchLabels?

aantn commented 1 week ago

Sorry, I mean what are the contents of spec.selector?!

headyj commented 1 week ago

Actually it's a Rollout (from Argo Rollouts) but it's very close to a Deployment and I don't think it makes a difference. The content of spec.selector is quite standard:

selector:
    matchLabels:
      app.kubernetes.io/component: backend
      app.kubernetes.io/instance: my-project
      app.kubernetes.io/name: company
      name: my-project-gateway
aantn commented 1 week ago

Thanks, that was actually very important information! The kubernetes python client renames matchLabels to Deployment.spec.match_labels but that renaming does not occur for CRDs!

I have created fix here - https://github.com/robusta-dev/krr/pull/308 Can you confirm that it works?

headyj commented 6 days ago

Yes I can confirm that it is working with Rollouts :+1:

aantn commented 6 days ago

Wonderful, thanks for the confirmation. I've merged the changes into the prometheus-workload-loader branch.