[Bug] No api service on 9001 - crashloop

matt-tvg commented 4 weeks ago

Kubecost Helm Chart Version

2.4.1

Kubernetes Version

v1.31.1-eks-ce1d5eb

Kubernetes Platform

EKS

Description

On fresh deploymentscost-analyzer-frontend creash loops with the error:

nginx: [emerg] host not found in upstream "cost-analyzer.build-ci-kubecost:9003" in /etc/nginx/conf.d/default.conf:42

The config map for the nginx config shows the upstream configured as i'd expect:

upstream api {                                                                                                                                               │
    server cost-analyzer.build-ci-kubecost:9001;                                                                                                             │
}

However there doesnt seem to be a service running on 9001:

 kubectl get services -n build-ci-kubecost --sort-by=.metadata.name
NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
cost-analyzer               ClusterIP   172.20.104.180   <none>        9003/TCP,9090/TCP   24m
cost-analyzer-aggregator    ClusterIP   172.20.120.179   <none>        9004/TCP            24m
cost-analyzer-cloud-cost    ClusterIP   172.20.193.29    <none>        9005/TCP            24m
cost-analyzer-forecasting   ClusterIP   172.20.152.48    <none>        5000/TCP            24m

Out of curiousity i tried setting it to 9090 via the below:

  useDefaultFqdn: false
  api:
    fqdn: cost-analyzer.build-ci-kubecost:9090

However that gave a similar error:

nginx: [emerg] host not found in upstream "cost-analyzer.build-ci-kubecost:9090" in /etc/nginx/conf.d/default.conf:38

Steps to reproduce

Fresh helm installation against an EKS cluster v1.31

Expected behavior

all pods to become ready and a frontend to be available on 9090

Impact

cannot access

Screenshots

No response

Logs

No response

Slack discussion

No response

Troubleshooting

[X] I have read and followed the issue guidelines and this is a bug impacting only the Helm chart.
[X] I have searched other issues in this repository and mine is not recorded.

chipzoller commented 4 weeks ago

Would you please provide the values you used for installation?

matt-tvg commented 4 weeks ago

Hi,

templated values are below :)

global:
  prometheus:
    enabled: false
    fqdn: http://prometheus-server.${prometheus_namespace}.svc
  grafana:
    enabled: false
    proxy: false

pricingCsv:
  enabled: false

nodeSelector:
    node-role: ${node_selector}
tolerations:
    - key: CriticalAddonsOnly
      operator: Equal
      value: "true"
      effect: NoSchedule

affinity: {}

# If true, creates a PriorityClass to be used by the cost-analyzer pod
priority:
  enabled: false
  # value: 1000000

# If true, enable creation of NetworkPolicy resources.
networkPolicy:
  enabled: false

podSecurityPolicy:
  enabled: false

kubecostFrontend:
  image: ${aws_account}.dkr.ecr.${aws_region}.amazonaws.com/ecr-public/kubecost/frontend
  imagePullPolicy: Always
  resources:
    requests:
      cpu: "10m"
      memory: "55Mi"
    #limits:
    #  cpu: "100m"
    #  memory: "256Mi"

forecasting:
  fullImageName: ${aws_account}.dkr.ecr.${aws_region}.amazonaws.com/ecr-public/kubecost/kubecost-modeling:v0.1.16
  imagePullPolicy: Always
  nodeSelector:
    node-role: ${node_selector}
  tolerations:
      - key: CriticalAddonsOnly
        operator: Equal
        value: "true"
        effect: NoSchedule

kubecostModel:
  image: ${aws_account}.dkr.ecr.${aws_region}.amazonaws.com/ecr-public/kubecost/cost-model
  imagePullPolicy: Always
  warmCache: true
  warmSavingsCache: true
  etl: true
  # The total number of days the ETL storage will build
  etlStoreDurationDays: 120
  maxQueryConcurrency: 5
  # utcOffset represents a timezone in hours and minutes east (+) or west (-)
  # of UTC, itself, which is defined as +00:00.
  # See the tz database of timezones to look up your local UTC offset:
  # https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  utcOffset: "+00:00"
  resources:
    requests:
      cpu: "200m"
      memory: "55Mi"
    #limits:
    #  cpu: "800m"
    #  memory: "256Mi"

# Define persistence volume for cost-analyzer
persistentVolume:
  size: 0.2Gi
  dbSize: 32.0Gi
  enabled: true # Note that setting this to false means configurations will be wiped out on pod restart.

service:
  type: ClusterIP
  port: 9090
  targetPort: 9090
  labels: {}
  annotations: {}

reporting:
  productAnalytics: false

image overrides are purely to make use of pull-through caching, the images ar eunaltered from source.

chipzoller commented 4 weeks ago

Please confirm all your Pods are in a running state following installation with these values.

matt-tvg commented 4 weeks ago

All containers in the cost-analyzer pod are fine bar cost-analyzer-frontend which crashloops with the above error and causes the pod to report as crashloopbackoff

The forecasting pod (1 container) is running fine.

chipzoller commented 4 weeks ago

I just performed an installation on EKS 1.31 (Kubecost 2.4.2) using the defaults and the eks-specific Helm values with no issues, although this does deploy the bundled Prometheus instance.

helm upgrade -i kubecost \
oci://public.ecr.aws/kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
-f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/v2.4/cost-analyzer/values-eks-cost-monitoring.yaml

Are you able to try this temporarily to see that it works for you?

chipzoller commented 2 weeks ago

Did this work for you?

kubecost / cost-analyzer-helm-chart