kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

Aggregator failures after Kubecost 2.2.2 upgrade #68

Closed dhavaln-able closed 1 month ago

dhavaln-able commented 7 months ago

Kubecost Helm Chart Version

2.2.2

Kubernetes Version

1.26

Kubernetes Platform

EKS

Description

I've recently upgraded to version 2.2.2 post recommendation from KubeCost support seeing many issues with v2.1.0. Once upgraded I see aggregator with below logs, which I believe stopping frontend UI to load and KubeCost dashboard is completely blanking out.

image

Steps to reproduce

  1. Upgrade to v2.2.2
  2. Check Aggregator logs

Expected behavior

I should not see the error logs posted in aggregator pod as I only get it post upgrade to 2.2.2

Impact

KubeCost UI isn't showing anything!

Screenshots

No response

Logs

2024-04-23T19:59:38.5146616Z ERR entering state: run_ingestor, err: error creating static tables: %!s(<nil>)
2024-04-23T19:59:38.514674141Z ERR after event, current state: run_ingestor, err: error creating static tables: %!s(<nil>)
2024-04-23T19:59:38.514687612Z ERR error submitting event: error creating static tables: %!s(<nil>)

Slack discussion

No response

Troubleshooting

teevans commented 7 months ago

@williamkubecost - Any ideas here?

williameasiernetworks commented 7 months ago

@dhavaln-able can you share your helm values and describe the aggregator pod for me?

chipzoller commented 6 months ago

Does not appear to be an issue with the Helm chart. Transferred to the correct repository.

AjayTripathy commented 6 months ago

@dhavaln-able can you please share values here and the result of kubectl describe on the aggregator pod?

kgogolek commented 6 months ago

I am having the same issue with 2.2.4 with same symptoms (frontend doesn't work)

The main issue seems to be that : https://kubecost.domain.com/model/providerOptimization https://kubecost.domain.com/model/diagnostic/coreCount?window=30d

return nginx 404

My values are: ` values: |

    kubecostProductConfigs:
      awsSpotDataRegion: eu-west-1

    kubecostAggregator:
      resources: 
        requests:
          cpu: 1
          memory: 1Gi          

    forecasting:
      fullImageName: public.ecr.aws/kubecost/kubecost-modeling:v0.1.11

    sigV4Proxy:
      region: eu-west-1
      host: aps-workspaces.eu-west-1.amazonaws.com
      image: public.ecr.aws/aws-observability/aws-sigv4-proxy:1.7
      imagePullPolicy: Always
      resources:
        requests:
          cpu: 0.1
          memory: 64Mi
    global:
      prometheus:
        enabled: false
      amp:
        enabled: true
        prometheusServerEndpoint: http://localhost:8005/workspaces/ws-XXX
        remoteWriteService: https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/ws-XXX/api/v1/remote_write
        sigv4:
          region: eu-west-1
      grafana:
        enabled: false
        domainName: grafana.grafana-dash
        proxy: false
    kubecostToken: "XYZ"
    podSecurityPolicy:
      enabled: false
    networkPolicy:
      enabled: true
      sameNamespace: false
      namespace: monitoring
      costAnalyzer:
          enabled: true 
          ingressRules:
            - selectors:
                - ipBlock:
                    cidr: 10.58.0.0/16
              ports:
                - protocol: TCP
                  port: 9003 
                - protocol: TCP
                  port: 9090
            - selectors:
                - namespaceSelector:
                    matchLabels:
                      name: kubecost
              ports:
                - protocol: TCP
                  port: 9001 
            - selectors:
                - podSelector:
                    matchLabels:
                      app.kubernetes.io/name: cost-analyzer
                - namespaceSelector:
                    matchLabels:
                      name: kubecost
              ports:
              - protocol: TCP
                port: 9003 
            - selectors:
                - namespaceSelector:
                    matchLabels:
                      name: kube-system
                  podSelector:
                    matchLabels:
                      app.kubernetes.io/instance: traefik  
              ports:
              - protocol: TCP
                port: 9003 
              - protocol: TCP
                port: 9090
          egressRules:
            - selectors:
                - ipBlock:
                    cidr: 10.58.0.0/16
              ports:
                - protocol: TCP
                  port: 443 
            - selectors:
                - ipBlock:
                    cidr: 172.20.187.157/32
              ports:
                - protocol: TCP
                  port: 9003 
            - selectors:
                - ipBlock:
                    cidr: 172.20.0.10/32
                - ipBlock:
                    cidr: 8.8.8.8/32
              ports:
              - protocol: TCP
                port: 53 
              - protocol: UDP
                port: 53 

    serviceAccount:
      create: true # Set this to false if you're bringing your own service account.
      annotations: 
        eks.amazonaws.com/role-arn: arn:aws:iam::1234567:role/kubecost-k8s-iam
    networkCosts:
      image:
        repository: public.ecr.aws/kubecost/kubecost-network-costs
      enabled: true
      podSecurityPolicy:
        enabled: false
      services:
        amazon-web-services: true
      additionalSecurityContext:
        readOnlyRootFilesystem: true
    kubecostModel:
      image: public.ecr.aws/kubecost/cost-model
      imagePullPolicy: IfNotPresent
      warmCache: true
      securityContext:
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - all
    kubecostFrontend:
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: "10m"
          memory: "55Mi"
      securityContext:
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - all

`

sdomme commented 5 months ago

Can confirm with 2.2.5 this issue still exists. Anything the community can contribute here? Debug logs, ... ?

kgogolek commented 5 months ago

just want to say that for us it ended up being a network policy. Looks like ports that need to be open have changed. Looking at the error logs on kubecost frontend pod with proxy timeouts helped me figure out what's going on. hope this helps someone

sdomme commented 5 months ago

@kgogolek Thanks for the hint. It was indeed something with this "new?" port 9004. But not in the network policy way (at least not for us). Having the aggregator running standalone (as sidecar), it needs additional configuration for this "new?" port 9004 to not let the frontend go the k8s service way. Wouldn't make sense anyway, since it runs in the same POD.

kubecostFrontend:
  aggregator:
    fqdn: "localhost:9004"

This one might also be helpful for folks with this issue.

chipzoller commented 1 month ago

Hello, in an effort to consolidate our bug and feature request tracking, we are deprecating using GitHub to track tickets. If this issue is still outstanding and you have not done so already, please raise a request at https://support.kubecost.com/.