robusta-dev / krr

Prometheus-based Kubernetes Resource Recommendations
MIT License
3.05k stars 160 forks source link

AWS EKS Prometheus and value_error #346

Closed mxw-sec closed 1 week ago

mxw-sec commented 1 month ago

Describe the bug Running krr simple --history-duration X I get the same output regardless.

WARNING Not enough history available for cluster arn:aws:eks:us-east-1:{ACCOUNT}:cluster/staging-karpenter. runner.py:235 WARNING If the cluster is freshly installed, it might take some time for the enough data to be available. runner.py:238 WARNING Enough data is estimated to be available after 2024-10-04 02:22:55, but will try to calculate recommendations anyway. runner.py:241 INFO Listing scannable objects in arn:aws:eks:us-east-1:{ACCOUNT}:cluster/staging-karpenter init.py:58 ERROR An unexpected error occurred runner.py:349 Traceback (most recent call last): File "robusta_krr/core/runner.py", line 342, in run File "robusta_krr/core/runner.py", line 288, in _collect_result File "robusta_krr/core/integrations/kubernetes/init.py", line 531, in list_scannable_objects File "robusta_krr/core/integrations/kubernetes/init.py", line 534, in File "robusta_krr/core/integrations/kubernetes/init.py", line 63, in list_scannable_objects File "robusta_krr/core/integrations/kubernetes/init.py", line 249, in _list_scannable_objects File "robusta_krr/core/integrations/kubernetes/init.py", line 249, in File "robusta_krr/core/integrations/kubernetes/init.py", line 172, in build_scannable_object File "robusta_krr/core/models/allocations.py", line 89, in from_container File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init__ pydantic.error_wrappers.ValidationError: 1 validation error for ResourceAllocations limits invalid literal for int() with base 10: '3200e6' (type=value_error)

To Reproduce run any command such as

krr simple --history-duration X

Expected behavior Output Provided

Screenshots If applicable, add screenshots to help explain your problem. image

Are you interested in contributing a fix for this? Yes/no. If yes, we will provide guidance what parts of the code to modify and help you.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

aantn commented 1 month ago

Weird, I'm not sure what the problem is. The limits field is not an integer, it's a float field which makes the error especially weird.

Do you see any more information when running with --verbose?

mxw-sec commented 1 month ago

Nothing new when running with --verbose

``Running Robusta's KRR (Kubernetes Resource Recommender) v.1.15.0 Using strategy: Simple Using formatter: table [10:48:29] DEBUG An error occurred while checking for a new version runner.py:85 Traceback (most recent call last): File "robusta_krr/core/runner.py", line 79, in check_newer_version_available File "robusta_krr/core/runner.py", line 75, in parse_version_string ValueError: invalid literal for int() with base 10: ''

[10:48:30] DEBUG Creating kubernetes python cli monkey patches patch.py:10 DEBUG Found 4 clusters: init.py:493 arn:aws:eks:us-east-1:{Account}:cluster/karpenter-test, arn:aws:eks:us-east-1:{Account}:cluster/test-cluster1, arn:aws:eks:us-west-2:{Account}:cluster/indtest-eks, arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Current cluster: init.py:494 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Configured clusters: [] init.py:496 INFO Using clusters: runner.py:280 ['arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter'] INFO Prometheus URL is specified, will not auto-detect a metrics service loader.py:55 INFO Trying to connect to Prometheus for prometheus_metrics_service.py:68 arn:aws:eks:us-east-1:{Account}:cluster/staging-karp enter cluster INFO Using Prometheus at http://127.0.0.1:9090 for cluster prometheus_metrics_service.py:97 arn:aws:eks:us-east-1:{Account}:cluster/staging-karp enter INFO Prometheus found loader.py:74 INFO Prometheus connected successfully for loader.py:47 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter cluster DEBUG History range for runner.py:231 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter: (datetime.datetime(2024, 10, 11, 5, 48, 31), datetime.datetime(2024, 10, 11, 10, 48, 31)) [10:48:31] INFO Listing scannable objects in init.py:58 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Namespaces: init.py:59 DEBUG Resources: init.py:60 DEBUG Listing HPA-v2s in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Found 3 HPA-v2 in init.py:221 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing Deployments in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing Rollouts in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing DeploymentConfigs in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing StatefulSets in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing DaemonSets in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing Jobs in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Listing CronJobs in init.py:189 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG DeploymentConfig API not available in init.py:253 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Rollout API not available in init.py:253 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Found 2 Job in init.py:221 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Found 1 CronJob in init.py:221 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Found 3 StatefulSet in init.py:221 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Found 6 DaemonSet in init.py:221 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter DEBUG Found 184 Deployment in init.py:221 arn:aws:eks:us-east-1:{Account}:cluster/staging-karpenter ERROR An unexpected error occurred runner.py:349 Traceback (most recent call last): File "robusta_krr/core/runner.py", line 342, in run File "robusta_krr/core/runner.py", line 288, in _collect_result File "robusta_krr/core/integrations/kubernetes/init.py", line 531, in list_scannable_objects File "robusta_krr/core/integrations/kubernetes/init.py", line 534, in File "robusta_krr/core/integrations/kubernetes/init.py", line 63, in list_scannable_objects File "robusta_krr/core/integrations/kubernetes/init.py", line 249, in _list_scannable_objects File "robusta_krr/core/integrations/kubernetes/init.py", line 249, in File "robusta_krr/core/integrations/kubernetes/init.py", line 172, in build_scannable_object File "robusta_krr/core/models/allocations.py", line 89, in from_container File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init__ pydantic.error_wrappers.ValidationError: 1 validation error for ResourceAllocations limits invalid literal for int() with base 10: '3200e6' (type=value_error `

mxw-sec commented 1 month ago

I am running Krr on ubuntu 22.04.5 in WSL, I installed it via brew, and just updated to 1.16.0.. Still getting the same issue.

arikalon1 commented 1 month ago

I think the 3200e6 is a legal limit in k8s, and krr doesn't know to parse it correctly 3200e6=3200000000, right? Screenshot 2024-10-11 at 7 40 15 PM

The issue seems to be while handing a deployment

Can you try running this, and see if indeed you have that memory limit? kubectl get deployments -A -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.template.spec.containers[*].resources.limits.memory}{"\n"}{end}'

arikalon1 commented 1 month ago
Screenshot 2024-10-11 at 7 53 15 PM
moshemorad commented 1 week ago

Issue was fixed in #361