kubecost / cost-analyzer-helm-chart

Kubecost helm chart
http://kubecost.com/install
Apache License 2.0
489 stars 419 forks source link

Grafana charts do not show any data #784

Closed munjalpatel closed 3 years ago

munjalpatel commented 3 years ago

Hello,

For some reason, I do not see any results in Grafana dashboard. I know most likely it's a config issue. Can someone please help me figure it out?

image

Here is my helm values file:

global:
  # zone: cluster.local (use only if your DNS server doesn't live in the same zone as kubecost)
  prometheus:
    enabled: true # If false, Prometheus will not be installed -- only actively supported on paid Kubecost plans
    fqdn: http://cost-analyzer-prometheus-server.default.svc #example fqdn. Ignored if enabled: true
    # insecureSkipVerify : false # If true, kubecost will not check the TLS cert of prometheus
    # queryServiceBasicAuthSecretName: dbsecret # kubectl create secret generic dbsecret -n kubecost --from-file=USERNAME --from-file=PASSWORD
    # queryServiceBearerTokenSecretName: dbsecret  # kubectl create secret generic mcdbsecret -n kubecost --from-file=TOKEN

  # Durable storage option, product key required
  thanos:
    enabled: false
    # queryService: http://kubecost-thanos-query-frontend-http.kubecost:{{ .Values.thanos.queryFrontend.http.port }} # an address of the thanos query-frontend endpoint, if different from installed thanos
    # queryServiceBasicAuthSecretName: mcdbsecret #  kubectl create secret generic mcdbsecret -n kubecost --from-file=USERNAME --from-file=PASSWORD <---enter basic auth credentials like that
    # queryServiceBearerTokenSecretName mcdbsecret # kubectl create secret generic mcdbsecret -n kubecost --from-file=TOKEN
    # queryOffset: 3h # The offset to apply to all thanos queries in order to achieve syncronization on all cluster block stores

  grafana:
    enabled: true # If false, Grafana will not be installed
    domainName: cost-analyzer-grafana.default.svc #example grafana domain Ignored if enabled: true
    scheme: "http" # http or https, for the domain name above.
    proxy: true # If true, the kubecost frontend will route to your grafana through its service endpoint
    tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

  notifications:
    # Kubecost alerting configuration
    # Ref: http://docs.kubecost.com/alerts
    alertConfigs:
      enabled: false # the example values below are never read unless enabled is set to true
      frontendUrl: http://localhost:9090 # optional, used for linkbacks
      globalSlackWebhookUrl: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX # optional, used for Slack alerts
      kubecostHealth: false # Alerts generated for kubecost uptime. Uses the globalSlackWebhookUrl to deliver the alert
      globalAlertEmails:
        - admin@company.com
      alerts: # Alerts generated by kubecost, about cluster data
          # Daily namespace budget alert on namespace `kubecost`
        - type: budget # supported: budget, recurringUpdate
          threshold: 50 # optional, required for budget alerts
          window: daily # or 1d
          aggregation: namespace
          filter: kubecost
          ownerContact: # optional, overrides globalAlertEmails default
            - admin@company.com
          slackWebhookUrl: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX # optional, used for alert-specific Slack alerts
          # Daily cluster budget alert (clusterCosts alert) on cluster `cluster-one`
        - type: budget
          threshold: 200.8 # optional, required for budget alerts
          window: daily # or 1d
          aggregation: cluster
          filter: cluster-one # does not accept csv
          # Recurring weekly update (weeklyUpdate alert)
        - type: recurringUpdate
          window: weekly # or 7d
          aggregation: namespace
          filter: '*'
          # Recurring weekly namespace update on kubecost namespace
        - type: recurringUpdate
          window: weekly # or 7d
          aggregation: namespace
          filter: kubecost
          # Spend Change Alert
        - type: spendChange  # change relative to moving avg
          relativeThreshold: 0.20  # Proportional change relative to baseline. Must be greater than -1 (can be negative)
          window: 1d                # accepts ‘d’, ‘h’
          baselineWindow: 30d       # previous window, offset by window
          aggregation: namespace
          filter: kubecost, default # accepts csv

    alertmanager: # Supply an alertmanager FQDN to receive notifications from the app.
      enabled: false # If true, allow kubecost to write to your alertmanager
      fqdn: http://cost-analyzer-prometheus-server.default.svc #example fqdn. Ignored if prometheus.enabled: true

   # Set saved report(s) accessible from reports.html
   # Ref: http://docs.kubecost.com/saved-reports
  savedReports:
    enabled: false # If true, overwrites report parameters set through UI
    reports:
      - title: "Example Saved Report 0"
        window: "today"
        aggregateBy: "namespace"
        idle: "separate"
        accumulate: false # daily resolution
        filters:
          - property: "cluster"
            value: "cluster-one,cluster*" # supports wildcard filtering and multiple comma separated values
          - property: "namespace"
            value: "kubecost"
      - title: "Example Saved Report 1"
        window: "month"
        aggregateBy: "controllerKind"
        idle: "share"
        accumulate: false
        filters:
          - property: "label"
            value: "app:cost*,environment:kube*"
          - property: "namespace"
            value: "kubecost"
      - title: "Example Saved Report 2"
        window: "2020-11-11T00:00:00Z,2020-12-09T23:59:59Z"
        aggregateBy: "service"
        idle: "hide"
        accumulate: true # entire window resolution
        filters: [] # if no filters, specify empty array

  podAnnotations: {}
    # iam.amazonaws.com/role: role-arn

# Advanced pipeline for custom prices, enterprise key required
pricingCsv:
  enabled: false
  location:
    provider: "AWS"
    region: "us-east-1"
    URI: s3://kc-csv-test/pricing_schema.csv # a valid file URI
    csvAccessCredentials: pricing-schema-access-secret

# SAML integration for user management and RBAC, enterprise key required
# Ref: https://github.com/kubecost/docs/blob/master/user-management.md
saml: 
  enabled: false
  secretName: "kubecost-authzero"
  #metadataSecretName: "kubecost-authzero-metadata" # One of metadataSecretName or idpMetadataURL must be set. defaults to metadataURL if set
  idpMetadataURL: "https://dev-elu2z98r.auth0.com/samlp/metadata/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2"
  appRootURL: "http://localhost:9090" # sample URL
  # audienceURI: "http://localhost:9090" # by convention, the same as the appRootURL, but any string uniquely identifying kubecost to your samp IDP. Optional if you follow the convention
  # nameIDFormat: "urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified" If your SAML provider requires a specific nameid format
  rbac:
    enabled: false
    groups:
      - name: admin
        enabled: false # if admin is disabled, all SAML users will be able to make configuration changes to the kubecost frontend
        assertionName: "http://schemas.auth0.com/userType" # a SAML Assertion, one of whose elements has a value that matches on of the values in assertionValues
        assertionValues:
          - "admin"
          - "superusers"
      - name: readonly
        enabled: false # if readonly is disabled, all users authorized on SAML will default to readonly
        assertionName:  "http://schemas.auth0.com/userType"
        assertionvalues:
          - "readonly"

# Adds an httpProxy as an environment variable. systemProxy.enabled must be `true`to have any effect.
# Ref: https://www.oreilly.com/library/view/security-with-go/9781788627917/5ea6a02b-3d96-44b1-ad3c-6ab60fcbbe4f.xhtml
systemProxy:
  enabled: false
  httpProxyUrl: ""
  httpsProxyUrl: ""
  noProxy: ""

# imagePullSecrets:
# - name: "image-pull-secret"

# Manages Kubecost alerts
kubecostChecks:
  enabled: true
  image: "quay.io/kubecost1/checks"
  resources:
    limits:
      cpu: 50m
      memory: 150Mi
    requests:
      cpu: 20m
      memory: 75Mi
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

kubecostFrontend:
  image: "gcr.io/kubecost1/frontend"
  imagePullPolicy: Always
  resources:
    requests:
      cpu: "10m"
      memory: "55Mi"
    #limits:
    #  cpu: "100m"
    #  memory: "256Mi"
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

kubecost:
  image: "gcr.io/kubecost1/server"
  resources:
    requests:
      cpu: "100m"
      memory: "55Mi"
    #limits:
    #  cpu: "100m"
    #  memory: "256Mi"
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

kubecostModel:
  image: "gcr.io/kubecost1/cost-model"
  imagePullPolicy: Always
  # Enables the emission of the kubecost_cloud_credit_total and
  # kubecost_cloud_expense_total metrics
  outOfClusterPromMetricsEnabled: false
  # Build local cost allocation cache
  warmCache: true
  # Build local savings cache
  warmSavingsCache: true
  # Run allocation ETL pipelines
  etl: true
  # The total number of days the ETL storage will build
  etlStoreDurationDays: 90
  # max number of concurrent Prometheus queries
  maxQueryConcurrency: 5
  # utcOffset represents a timezone in hours and minutes east (+) or west (-)
  # of UTC, itself, which is defined as +00:00.
  # See the tz database of timezones to look up your local UTC offset:
  # https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  utcOffset: "-08:00"
  resources:
    requests:
      cpu: "200m"
      memory: "55Mi"
    #limits:
    #  cpu: "800m"
    #  memory: "256Mi"
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

# Basic Kubecost ingress, more examples available at https://github.com/kubecost/docs/blob/master/ingress-examples.md
ingress:
  enabled: false
  annotations:
    kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  paths: ["/"] # There's no need to route specifically to the pods-- we have an nginx deployed that handles routing
  hosts:
    - cost-analyzer.local
  tls: []
  #  - secretName: cost-analyzer-tls
  #    hosts:
  #      - cost-analyzer.local

nodeSelector: {}

tolerations:
  - effect: NoSchedule
    key: kubernetes.azure.com/scalesetpriority
    operator: Equal
    value: spot

affinity: {}

# If true, creates a PriorityClass to be used by the cost-analyzer pod
priority:
  enabled: false
  # value: 1000000

# If true, enable creation of NetworkPolicy resources.
networkPolicy:
  enabled: false

podSecurityPolicy:
  enabled: true

# Define persistence volume for cost-analyzer, more information at https://github.com/kubecost/docs/blob/master/storage.md
persistentVolume:
  size: 32Gi
  dbSize: 32.0Gi
  enabled: true # Note that setting this to false means configurations will be wiped out on pod restart.
  storageClass: "longhorn"
  # existingClaim: kubecost-cost-analyzer # a claim in the same namespace as kubecost

service:
  type: ClusterIP
  port: 9090
  targetPort: 9090
  # nodePort:
  labels: {}
  annotations: {}

# Enabling long-term durable storage with Postgres requires an enterprise license
remoteWrite:
  postgres:
    enabled: false
    initImage: "gcr.io/kubecost1/sql-init"
    initImagePullPolicy: Always
    installLocal: true
    remotePostgresAddress: "" # ignored if installing locally
    persistentVolume:
      size: 200Gi
    auth:
      password: admin # change me

prometheus:
  extraScrapeConfigs: |
    - job_name: kubecost
      honor_labels: true
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: http
      dns_sd_configs:
      - names:
        - {{ template "cost-analyzer.serviceName" . }}
        type: 'A'
        port: 9003
    - job_name: kubecost-networking
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
      # Scrape only the the targets matching the following metadata
        - source_labels: [__meta_kubernetes_pod_label_app]
          action: keep
          regex:  {{ template "cost-analyzer.networkCostsName" . }}
  server:
    # If clusterIDConfigmap is defined, instead use user-generated configmap with key CLUSTER_ID
    # to use as unique cluster ID in kubecost cost-analyzer deployment.
    # This overrides the cluster_id set in prometheus.server.global.external_labels.
    # NOTE: This does not affect the external_labels set in prometheus config.
    # clusterIDConfigmap: cluster-id-configmap

    resources: {}
    # limits:
    #   cpu: 500m
    #   memory: 512Mi
    # requests:
    #   cpu: 500m
    #   memory: 512Mi
    global:
      scrape_interval: 1m
      scrape_timeout: 10s
      evaluation_interval: 1m
      external_labels:
        cluster_id: cluster-one # Each cluster should have a unique ID
    persistentVolume:
      size: 32Gi
      enabled: true
      storageClass: 'longhorn'
    extraArgs:
      query.max-concurrency: 1
      query.max-samples: 100000000
    tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
  alertmanager:
    enabled: true
    persistentVolume:
      enabled: true
      size: 2Gi
      storageClass: 'longhorn'
    tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
  nodeExporter:
    enabled: true
    tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
  pushgateway:
    enabled: false
    persistentVolume:
      enabled: true
  serverFiles:
  #  prometheus.yml: # Sample block -- enable if using an in cluster durable store.
  #      remote_write:
  #        - url: "http://pgprometheus-adapter:9201/write"
  #          write_relabel_configs:
  #            - source_labels: [__name__]
  #              regex: 'container_.*_allocation|container_.*_allocation_bytes|.*_hourly_cost|kube_pod_container_resource_requests_memory_bytes|container_memory_working_set_bytes|kube_pod_container_resource_requests_cpu_cores|kube_pod_container_resource_requests|pod_pvc_allocation|kube_namespace_labels|kube_pod_labels'
  #              action: keep
  #          queue_config:
  #            max_samples_per_send: 1000
        #remote_read:
        #  - url: "http://pgprometheus-adapter:9201/read"
    rules:
      groups:
        - name: CPU
          rules:
            - expr: sum(rate(container_cpu_usage_seconds_total{container_name!=""}[5m]))
              record: cluster:cpu_usage:rate5m
            - expr: rate(container_cpu_usage_seconds_total{container_name!=""}[5m])
              record: cluster:cpu_usage_nosum:rate5m
            - expr: avg(irate(container_cpu_usage_seconds_total{container_name!="POD", container_name!=""}[5m])) by (container_name,pod_name,namespace)
              record: kubecost_container_cpu_usage_irate
            - expr: sum(container_memory_working_set_bytes{container_name!="POD",container_name!=""}) by (container_name,pod_name,namespace)
              record: kubecost_container_memory_working_set_bytes
            - expr: sum(container_memory_working_set_bytes{container_name!="POD",container_name!=""})
              record: kubecost_cluster_memory_working_set_bytes
        - name: Savings
          rules:
            - expr: sum(avg(kube_pod_owner{owner_kind!="DaemonSet"}) by (pod) * sum(container_cpu_allocation) by (pod))
              record: kubecost_savings_cpu_allocation
              labels:
                daemonset: "false"
            - expr: sum(avg(kube_pod_owner{owner_kind="DaemonSet"}) by (pod) * sum(container_cpu_allocation) by (pod)) / sum(kube_node_info)
              record: kubecost_savings_cpu_allocation
              labels:
                daemonset: "true"
            - expr: sum(avg(kube_pod_owner{owner_kind!="DaemonSet"}) by (pod) * sum(container_memory_allocation_bytes) by (pod))
              record: kubecost_savings_memory_allocation_bytes
              labels:
                daemonset: "false"
            - expr: sum(avg(kube_pod_owner{owner_kind="DaemonSet"}) by (pod) * sum(container_memory_allocation_bytes) by (pod)) / sum(kube_node_info)
              record: kubecost_savings_memory_allocation_bytes
              labels:
                daemonset: "true"
            - expr: label_replace(sum(kube_pod_status_phase{phase="Running",namespace!="kube-system"} > 0) by (pod, namespace), "pod_name", "$1", "pod", "(.+)")
              record: kubecost_savings_running_pods
            - expr: sum(rate(container_cpu_usage_seconds_total{container_name!="",container_name!="POD",instance!=""}[5m])) by (namespace, pod_name, container_name, instance)
              record: kubecost_savings_container_cpu_usage_seconds
            - expr: sum(container_memory_working_set_bytes{container_name!="",container_name!="POD",instance!=""}) by (namespace, pod_name, container_name, instance)
              record: kubecost_savings_container_memory_usage_bytes
            - expr: avg(sum(kube_pod_container_resource_requests_cpu_cores{namespace!="kube-system"}) by (pod, namespace, instance)) by (pod, namespace)
              record: kubecost_savings_pod_requests_cpu_cores
            - expr: avg(sum(kube_pod_container_resource_requests_memory_bytes{namespace!="kube-system"}) by (pod, namespace, instance)) by (pod, namespace)
              record: kubecost_savings_pod_requests_memory_bytes

## Module for measuring network costs
## Ref: https://github.com/kubecost/docs/blob/master/network-allocation.md
networkCosts:
  enabled: true
  podSecurityPolicy:
    enabled: false
  image: gcr.io/kubecost1/kubecost-network-costs:v15.0
  imagePullPolicy: Always
  # Traffic Logging will enable logging the top 5 destinations for each source
  # every 30 minutes.
  trafficLogging: true
  # Port will set both the containerPort and hostPort to this value.
  # These must be identical due to network-costs being run on hostNetwork
  port: 3001
  resources: {}
    #requests:
    #  cpu: "50m"
    #  memory: "20Mi"
  config:
    # Configuration for traffic destinations, including specific classification
    # for IPs and CIDR blocks. This configuration will act as an override to the
    # automatic classification provided by network-costs.
    destinations:
      # In Zone contains a list of address/range that will be
      # classified as in zone.
      in-zone:
        # Loopback
        - "127.0.0.1"
        # IPv4 Link Local Address Space
        - "169.254.0.0/16"
        # Private Address Ranges in RFC-1918
        - "10.0.0.0/8"
        - "172.16.0.0/12"
        - "192.168.0.0/16"

      # In Region contains a list of address/range that will be
      # classified as in region. This is synonymous with cross
      # zone traffic, where the regions between source and destinations
      # are the same, but the zone is different.
      in-region: []

      # Cross Region contains a list of address/range that will be
      # classified as non-internet egress from one region to another.
      cross-region: []

      # Direct Classification specifically maps an ip address or range
      # to a region (required) and/or zone (optional). This classification
      # takes priority over in-zone, in-region, and cross-region configurations.
      direct-classification: []
      # - region: "us-east1"
      #   zone: "us-east1-c"
      #   ips:
      #     - "10.0.0.0/24"

  ## Node tolerations for server scheduling to nodes with taints
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  ##
  tolerations:
  - effect: NoSchedule
    key: kubernetes.azure.com/scalesetpriority
    operator: Equal
    value: spot

  ## PriorityClassName
  ## Ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
  priorityClassName: []
  ## PodMonitor
  ## Allows scraping of network metrics from a dedicated prometheus operator setup
  podMonitor:
    enabled: false
    additionalLabels: {}

# Kubecost Deployment Configuration
# Used for HA mode in Business & Enterprise tier
kubecostDeployment:
  replicas: 1

# Kubecost Cluster Controller for Right Sizing and Cluster Turndown
clusterController:
  enabled: false
  image: gcr.io/kubecost1/cluster-controller:v0.0.2
  imagePullPolicy: Always

reporting:
  # Kubecost bug report feature: Logs access/collection limited to .Release.Namespace
  # Ref: http://docs.kubecost.com/bug-report
  logCollection: true
  # Basic frontend analytics
  productAnalytics: true
  # Report Javascript errors
  errorReporting: true
  valuesReporting: true

serviceMonitor:
  enabled: false
  additionalLabels: {}

prometheusRule:
  enabled: true
  additionalLabels: {}

supportNFS: true
# initChownDataImage ensures all Kubecost filepath permissions on PV or local storage are set up correctly.
initChownDataImage: "busybox" # Supports a fully qualified Docker image, e.g. registry.hub.docker.com/library/busybox:latest
initChownData:
  resources: {}
    #requests:
    #  cpu: "50m"
    #  memory: "20Mi"

grafana:
  # namespace_datasources: kubecost # override the default namespace here
  # namespace_dashboards: kubecost # override the default namespace here
  sidecar:
    dashboards:
      enabled: true
      # label that the configmaps with dashboards are marked with
      label: grafana_dashboard
    datasources:
      # dataSourceFilename: foo.yml # If you need to change the name of the datasource file
      enabled: true
      defaultDatasourceEnabled: false
      dataSourceName: default-kubecost
      # label that the configmaps with datasources are marked with
      label: kubecost_grafana_datasource
#  For grafana to be accessible, add the path to root_url. For example, if you run kubecost at www.foo.com:9090/kubecost
#  set root_url to "%(protocol)s://%(domain)s:%(http_port)s/kubecost/grafana". No change is necessary here if kubecost runs at a root URL
  grafana.ini:
    server:
      root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana"
serviceAccount:
  create: true # Set this to false if you're bringing your own service account.
  annotations: {}
  # name: kc-test

# readonly: false # disable updates to kubecost from the frontend UI and via POST request

# These configs can also be set from the Settings page in the Kubecost product UI
# Values in this block override config changes in the Settings UI on pod restart
#
#kubecostProductConfigs:
# An optional list of cluster definitions that can be added for frontend access. The local
# cluster is *always* included by default, so this list is for non-local clusters.
# Ref: https://github.com/kubecost/docs/blob/master/multi-cluster.md
#  clusters:
#   - name: "Cluster A"
#     address: http://cluster-a.kubecost.com:9090
#     # Optional authentication credentials - only basic auth is currently supported.
#     auth:
#       type: basic
#       # Secret name should be a secret formatted based on: https://github.com/kubecost/docs/blob/master/ingress-examples.md
#       secretName: cluster-a-auth
#       # Or pass auth directly as base64 encoded user:pass
#       data: YWRtaW46YWRtaW4=
#       # Or user and pass directly
#       user: admin
#       pass: admin
#   - name: "Cluster B"
#     address: http://cluster-b.kubecost.com:9090
#  defaultModelPricing: # default monthly resource prices, used predominately for on-prem clusters
#    CPU: 28.0
#    spotCPU: 4.86
#    RAM: 3.09
#    spotRAM: 0.65
#    GPU: 693.50
#    spotGPU: 225.0
#    storage: 0.04
#    zoneNetworkEgress: 0.01
#    regionNetworkEgress: 0.01
#    internetNetworkEgress: 0.12
#    enabled: true
#  # The cluster profile represents a predefined set of parameters to use when calculating savings.
#  # Possible values are: [ development, production, high-availability ]
#  clusterProfile: production
#  customPricesEnabled: false # This makes the default view custom prices-- generally used for on-premises clusters
#  spotLabel: lifecycle
#  spotLabelValue: Ec2Spot
#  gpuLabel: gpu
#  gpuLabelValue: true
#  awsServiceKeyName: ACCESSKEYID
#  awsServiceKeyPassword:  fakepassword # Only use if your values.yaml are stored encrypted. Otherwise provide an existing secret via serviceKeySecretName
#  awsSpotDataRegion: us-east-1
#  awsSpotDataBucket: spot-data-feed-s3-bucket
#  awsSpotDataPrefix: dev
#  athenaProjectID: "530337586277" # The AWS AccountID where the Athena CUR is. Generally your masterpayer account
#  athenaBucketName: "s3://aws-athena-query-results-530337586277-us-east-1"
#  athenaRegion: us-east-1
#  athenaDatabase: athenacurcfn_athena_test1
#  athenaTable: "athena_test1"
#  masterPayerARN: ""
#  projectID: "123456789"  # Also known as AccountID on AWS -- the current account/project that this instance of Kubecost is deployed on.
#  gcpSecretName: gcp-secret # Name of a secret representing the gcp service key
#  bigQueryBillingDataDataset: billing_data.gcp_billing_export_v1_01AC9F_74CF1D_5565A2
#  labelMappingConfigs:  # names of k8s labels used to designate different allocation concepts
#    enabled: true
#    owner_label: "owner"
#    team_label: "team"
#    department_label: "dept"
#    product_label: "product"
#    environment_label: "env"
#    namespace_external_label: "kubernetes_namespace" # external labels are used to map external cloud costs to kubernetes concepts
#    cluster_external_label: "kubernetes_cluster"
#    controller_external_label: "kubernetes_controller"
#    product_external_label: "kubernetes_label_app"
#    service_external_label: "kubernetes_service"
#    deployment_external_label: "kubernetes_deployment"
#    team_external_label: "kubernetes_label_team"
#    environment_external_label: "kubernetes_label_env"
#    department_external_label: "kubernetes_label_department"
#    statefulset_external_label: "kubernetes_statefulset"
#    daemonset_external_label: "kubernetes_daemonset"
#    pod_external_label: "kubernetes_pod"
#  grafanaURL: ""
#  clusterName: "" # used for display in Kubecost UI
#  currencyCode: "USD" # offical support for USD, CAD, EUR, and CHF
#  azureBillingRegion: US # Represents 2-letter region code, e.g. West Europe = NL, Canada = CA. ref: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
#  azureSubscriptionID: 0bd50fdf-c923-4e1e-850c-196dd3dcc5d3
#  azureClientID: f2ef6f7d-71fb-47c8-b766-8d63a19db017
#  azureTenantID: 72faf3ff-7a3f-4597-b0d9-7b0b201bb23a
#  azureClientPassword: fake key # Only use if your values.yaml are stored encrypted. Otherwise provide an existing secret via serviceKeySecretName
#  azureStorageAccount: "kubecoststore" # Name of Azure Storage account where cost report is being saved
#  azureStorageAccessKey: "" # Azure Storage account access key found in Azure Storage portal for account where cost report is being saved
#  azureStorageContainer: "test" # Name of daily month-to-date cost report for Azure
#  azureStorageSecretName: "azure-storage-config" # Name of Kubernetes Secret where Azure Storage Configuration is stored
#  azureStorageCreateSecret: true # Create secret for Azure Storage config from values in this file
#  discount: "" # percentage discount applied to compute
#  negotiatedDiscount: "" # custom negotiated cloud provider discount
#  defaultIdle: false
#  serviceKeySecretName: "" # Use an existing AWS or Azure secret with format as in aws-service-key-secret.yaml or azure-service-key-secret.yaml. Leave blank if using createServiceKeySecret
#  createServiceKeySecret: true # Creates a secret representing your cloud service key based on data in values.yaml. If you are storing unencrypted values, add a secret manually
#  sharedNamespaces: "" # namespaces with shared workloads, example value: "kube-system\,ingress-nginx\,kubecost\,monitoring"
#  productKey: # apply business or enterprise product license
#    key: ""
#    enabled: false
#    secretname: productkeysecret # create a secret out of a file named productkey.json of format { "key": "kc-b1325234" }
dwbrown2 commented 3 years ago

@munjalpatel can you view one of these graphs and see what response you are getting from Prometheus? This appears to be an issue with your Prometheus and less Grafana. If that's true, it could be networking related but is more likely related to the status of your Prom.

I may try running a query directly in Prometheus if you're not getting a response: https://github.com/kubecost/docs/blob/master/prometheus.md

munjalpatel commented 3 years ago

I think Prometheus seems to be working fine:

image

Here is the response I get when my Browser executes GET on http://localhost:9090/grafana/api/dashboards/uid/JOUdHGZZz

{
  "meta": {
    "type": "db",
    "canSave": true,
    "canEdit": true,
    "canAdmin": false,
    "canStar": false,
    "slug": "kubecost-cluster-metrics",
    "url": "/grafana/d/JOUdHGZZz/kubecost-cluster-metrics",
    "expires": "0001-01-01T00:00:00Z",
    "created": "2021-02-21T19:36:36Z",
    "updated": "2021-02-21T19:36:36Z",
    "updatedBy": "Anonymous",
    "createdBy": "Anonymous",
    "version": 1,
    "hasAcl": false,
    "isFolder": false,
    "folderId": 0,
    "folderTitle": "General",
    "folderUrl": "",
    "provisioned": true,
    "provisionedExternalId": "cluster-metrics.json"
  },
  "dashboard": {
    "annotations": {
      "list": [
        {
          "builtIn": 1,
          "datasource": "-- Grafana --",
          "enable": true,
          "hide": true,
          "iconColor": "rgba(0, 211, 255, 1)",
          "name": "Annotations \u0026 Alerts",
          "type": "dashboard"
        }
      ]
    },
    "description": "Cost metrics from the Kubecost product",
    "editable": true,
    "gnetId": null,
    "graphTooltip": 0,
    "id": 8,
    "iteration": 1558062099204,
    "links": [],
    "panels": [
      {
        "content": "Note: this dashboard requires Kubecost metrics to be available in your Prometheus deployment. [Learn more](https://github.com/kubecost/cost-model/blob/master/PROMETHEUS.md)",
        "gridPos": { "h": 2, "w": 24, "x": 0, "y": 0 },
        "id": 27,
        "links": [],
        "mode": "markdown",
        "title": "",
        "transparent": true,
        "type": "text"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": false,
        "colors": ["#299c46", "rgba(237, 129, 40, 0.89)", "#d44a3a"],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "Monthly run rate of CPU + GPU costs based on currently provisioned resources.",
        "format": "currencyUSD",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": false,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 3, "w": 6, "x": 0, "y": 2 },
        "hideTimeOverride": true,
        "id": 2,
        "interval": null,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": true,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "label_cloud_google_com_gke_preemptible",
        "targets": [
          {
            "expr": "sum(\n  avg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100) +\n  avg(node_gpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n)",
            "format": "time_series",
            "instant": true,
            "interval": "",
            "intervalFactor": 1,
            "legendFormat": " {{ node }}",
            "refId": "A"
          }
        ],
        "thresholds": "",
        "timeFrom": "15m",
        "timeShift": null,
        "title": "CPU Cost",
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": false,
        "colors": ["#299c46", "rgba(237, 129, 40, 0.89)", "#d44a3a"],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "Monthly run rate of memory costs based on currently provisioned expenses.",
        "format": "currencyUSD",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": false,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 3, "w": 6, "x": 6, "y": 2 },
        "hideTimeOverride": true,
        "id": 3,
        "interval": null,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "label_cloud_google_com_gke_preemptible",
        "targets": [
          {
            "expr": "sum(\n  avg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n)",
            "format": "time_series",
            "instant": true,
            "interval": "",
            "intervalFactor": 1,
            "legendFormat": " {{ node }}",
            "refId": "A"
          }
        ],
        "thresholds": "",
        "timeFrom": "15m",
        "timeShift": null,
        "title": "Memory Cost",
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": false,
        "colors": ["#299c46", "rgba(237, 129, 40, 0.89)", "#d44a3a"],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "Monthly run rate of attached storage and PV costs based on currently provisioned resources.",
        "format": "currencyUSD",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": false,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 3, "w": 6, "x": 12, "y": 2 },
        "hideTimeOverride": true,
        "id": 4,
        "interval": null,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "label_cloud_google_com_gke_preemptible",
        "targets": [
          {
            "expr": "sum(avg(pv_hourly_cost) by (persistentvolume) * 730 * avg(kube_persistentvolume_capacity_bytes) by (persistentvolume) / 1024 / 1024 / 1024) \n+\nsum(sum(container_fs_limit_bytes{device!=\"tmpfs\", id=\"/\"}) by (instance) / 1024 / 1024 / 1024) * $localStorageGBCost",
            "format": "time_series",
            "instant": true,
            "interval": "",
            "intervalFactor": 1,
            "legendFormat": " {{ node }}",
            "refId": "A"
          }
        ],
        "thresholds": "",
        "timeFrom": "15m",
        "timeShift": null,
        "title": "Storage Cost",
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": false,
        "colors": ["#299c46", "rgba(237, 129, 40, 0.89)", "#d44a3a"],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "Sum of compute, memory, storage and network costs.",
        "format": "currencyUSD",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": false,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 7, "w": 6, "x": 18, "y": 2 },
        "hideTimeOverride": true,
        "id": 11,
        "interval": null,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "label_cloud_google_com_gke_preemptible",
        "targets": [
          {
            "expr": "# Compute\nsum(\n  avg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100) +\n  avg(node_gpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n) +\n\n\n# Memory\nsum(\n  avg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n) +\n\n# Storage \n\nsum(avg(pv_hourly_cost) by (persistentvolume) * 730 * avg(kube_persistentvolume_capacity_bytes) by (persistentvolume) / 1024 / 1024 / 1024) \n+\nsum(sum(container_fs_limit_bytes{device!=\"tmpfs\", id=\"/\"}) by (instance) / 1024 / 1024 / 1024) * $localStorageGBCost",
            "format": "time_series",
            "instant": true,
            "interval": "",
            "intervalFactor": 1,
            "legendFormat": " {{ node }}",
            "refId": "A"
          }
        ],
        "thresholds": "",
        "timeFrom": "15m",
        "timeShift": null,
        "title": "Total Cost",
        "type": "singlestat",
        "valueFontSize": "120%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": true,
        "colors": [
          "rgba(245, 54, 54, 0.9)",
          "rgba(50, 172, 45, 0.97)",
          "#c15c17"
        ],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "Current CPU use from applications divided by allocatable CPUs",
        "editable": true,
        "error": false,
        "format": "percent",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": true,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 4, "w": 3, "x": 0, "y": 5 },
        "height": "180px",
        "hideTimeOverride": true,
        "id": 13,
        "interval": null,
        "isNew": true,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "",
        "targets": [
          {
            "expr": "(\n sum(\n   count(irate(container_cpu_usage_seconds_total{id=\"/\"}[10m])) by (instance)\n   * on (instance) \n   sum(irate(container_cpu_usage_seconds_total{id=\"/\"}[10m])) by (instance)\n ) \n / \n (sum (kube_node_status_allocatable_cpu_cores))\n) * 100",
            "format": "time_series",
            "interval": "",
            "intervalFactor": 1,
            "refId": "A",
            "step": 10
          }
        ],
        "thresholds": "30, 80",
        "timeFrom": "",
        "title": "CPU Utilization",
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": true,
        "colors": [
          "rgba(245, 54, 54, 0.9)",
          "rgba(50, 172, 45, 0.97)",
          "#c15c17"
        ],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "Current CPU reservation requests from applications vs allocatable CPU",
        "editable": true,
        "error": false,
        "format": "percent",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": true,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 4, "w": 3, "x": 3, "y": 5 },
        "height": "180px",
        "id": 15,
        "interval": null,
        "isNew": true,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "",
        "targets": [
          {
            "expr": "SUM(kube_pod_container_resource_requests_cpu_cores) / SUM(kube_node_status_allocatable_cpu_cores) * 100",
            "format": "time_series",
            "interval": "",
            "intervalFactor": 1,
            "refId": "A",
            "step": 10
          }
        ],
        "thresholds": "30, 80",
        "title": "CPU Requests",
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": true,
        "colors": [
          "rgba(245, 54, 54, 0.9)",
          "rgba(50, 172, 45, 0.97)",
          "#c15c17"
        ],
        "datasource": "default-kubecost",
        "description": "Current RAM use vs RAM available",
        "editable": true,
        "error": false,
        "format": "percent",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": true,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 4, "w": 3, "x": 6, "y": 5 },
        "height": "180px",
        "hideTimeOverride": true,
        "id": 17,
        "interval": null,
        "isNew": true,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "",
        "targets": [
          {
            "expr": "SUM(container_memory_usage_bytes{namespace!=\"\"}) / SUM(kube_node_status_allocatable_memory_bytes) * 100",
            "format": "time_series",
            "interval": "",
            "intervalFactor": 1,
            "refId": "A",
            "step": 10
          },
          {
            "expr": "",
            "format": "time_series",
            "intervalFactor": 1,
            "refId": "B"
          }
        ],
        "thresholds": "30,80",
        "timeFrom": "",
        "title": "RAM Utilization",
        "transparent": false,
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": true,
        "colors": [
          "rgba(245, 54, 54, 0.9)",
          "rgba(50, 172, 45, 0.97)",
          "#c15c17"
        ],
        "datasource": "default-kubecost",
        "description": "Current RAM requests vs RAM available",
        "editable": true,
        "error": false,
        "format": "percent",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": true,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 4, "w": 3, "x": 9, "y": 5 },
        "height": "180px",
        "id": 19,
        "interval": null,
        "isNew": true,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "",
        "targets": [
          {
            "expr": "(\n sum(kube_pod_container_resource_requests_memory_bytes{namespace!=\"\"})\n /\n sum(kube_node_status_allocatable_memory_bytes)\n) * 100",
            "format": "time_series",
            "interval": "",
            "intervalFactor": 1,
            "refId": "A",
            "step": 10
          }
        ],
        "thresholds": "30,80",
        "title": "RAM Requests",
        "transparent": false,
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "cacheTimeout": null,
        "colorBackground": false,
        "colorValue": true,
        "colors": [
          "rgba(245, 54, 54, 0.9)",
          "rgba(50, 172, 45, 0.97)",
          "#c15c17"
        ],
        "datasource": "default-kubecost",
        "decimals": 2,
        "description": "This gauge shows the current standard storage use, including cluster storage, vs storage available",
        "editable": true,
        "error": false,
        "format": "percent",
        "gauge": {
          "maxValue": 100,
          "minValue": 0,
          "show": true,
          "thresholdLabels": false,
          "thresholdMarkers": true
        },
        "gridPos": { "h": 4, "w": 6, "x": 12, "y": 5 },
        "height": "180px",
        "hideTimeOverride": true,
        "id": 21,
        "interval": null,
        "isNew": true,
        "links": [],
        "mappingType": 1,
        "mappingTypes": [
          { "name": "value to text", "value": 1 },
          { "name": "range to text", "value": 2 }
        ],
        "maxDataPoints": 100,
        "nullPointMode": "connected",
        "nullText": null,
        "postfix": "",
        "postfixFontSize": "50%",
        "prefix": "",
        "prefixFontSize": "50%",
        "rangeMaps": [{ "from": "null", "text": "N/A", "to": "null" }],
        "sparkline": {
          "fillColor": "rgba(31, 118, 189, 0.18)",
          "full": false,
          "lineColor": "rgb(31, 120, 193)",
          "show": false
        },
        "tableColumn": "",
        "targets": [
          {
            "expr": "sum (\n sum(kube_persistentvolumeclaim_info) by (persistentvolumeclaim, namespace, storageclass)\n + on (persistentvolumeclaim, namespace) group_right(storageclass)\n sum(kubelet_volume_stats_used_bytes) by (persistentvolumeclaim, namespace) or up * 0\n + sum(container_fs_usage_bytes{device=~\"^/dev/[sv]d[a-z][1-9]$\",id=\"/\"})\n) /\nsum (\n sum(kube_persistentvolumeclaim_info) by (persistentvolumeclaim, namespace, storageclass)\n + on (persistentvolumeclaim, namespace) group_right(storageclass)\n sum(kube_persistentvolumeclaim_resource_requests_storage_bytes) by (persistentvolumeclaim, namespace) or up * 0\n + sum(container_fs_limit_bytes{device=~\"^/dev/[sv]d[a-z][1-9]$\",id=\"/\"})\n) * 100",
            "format": "time_series",
            "interval": "",
            "intervalFactor": 1,
            "refId": "A",
            "step": 10
          }
        ],
        "thresholds": "30, 80",
        "timeFrom": "",
        "title": "Storage Utilization",
        "type": "singlestat",
        "valueFontSize": "80%",
        "valueMaps": [{ "op": "=", "text": "N/A", "value": "null" }],
        "valueName": "current"
      },
      {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": "default-kubecost",
        "description": "Monthly run rate of CPU + GPU costs",
        "fill": 1,
        "gridPos": { "h": 7, "w": 6, "x": 0, "y": 9 },
        "id": 6,
        "interval": "1m",
        "legend": {
          "avg": false,
          "current": false,
          "max": false,
          "min": false,
          "show": false,
          "total": false,
          "values": false
        },
        "lines": true,
        "linewidth": 1,
        "links": [],
        "nullPointMode": "null",
        "percentage": false,
        "pointradius": 5,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
          {
            "expr": "sum(\n  avg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost) by (node) * 730 +\n  avg(node_gpu_hourly_cost) by (node) * 730\n)",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "compute cost",
            "refId": "A"
          }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeShift": null,
        "title": "Compute Cost",
        "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
        "type": "graph",
        "xaxis": {
          "buckets": null,
          "mode": "time",
          "name": null,
          "show": true,
          "values": []
        },
        "yaxes": [
          {
            "format": "currencyUSD",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": "0",
            "show": true
          },
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
          }
        ],
        "yaxis": { "align": false, "alignLevel": null }
      },
      {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": "default-kubecost",
        "description": "Monthly run rate of memory costs",
        "fill": 1,
        "gridPos": { "h": 7, "w": 6, "x": 6, "y": 9 },
        "id": 9,
        "interval": "1m",
        "legend": {
          "avg": false,
          "current": false,
          "max": false,
          "min": false,
          "show": false,
          "total": false,
          "values": false
        },
        "lines": true,
        "linewidth": 1,
        "links": [],
        "nullPointMode": "null",
        "percentage": false,
        "pointradius": 5,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
          {
            "expr": "sum(\n  avg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730\n)",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "memory cost",
            "refId": "A"
          }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeShift": null,
        "title": "Memory Cost",
        "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
        "type": "graph",
        "xaxis": {
          "buckets": null,
          "mode": "time",
          "name": null,
          "show": true,
          "values": []
        },
        "yaxes": [
          {
            "format": "currencyUSD",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": "0",
            "show": true
          },
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
          }
        ],
        "yaxis": { "align": false, "alignLevel": null }
      },
      {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": "default-kubecost",
        "description": "Monthly run rate of attached disk + PV storage costs",
        "fill": 1,
        "gridPos": { "h": 7, "w": 6, "x": 12, "y": 9 },
        "id": 10,
        "interval": "1m",
        "legend": {
          "avg": false,
          "current": false,
          "max": false,
          "min": false,
          "show": false,
          "total": false,
          "values": false
        },
        "lines": true,
        "linewidth": 1,
        "links": [],
        "nullPointMode": "null",
        "percentage": false,
        "pointradius": 5,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
          {
            "expr": "sum(\n  avg(avg_over_time(pv_hourly_cost[$timeRange] offset 1m)) by (persistentvolume) * 730 \n  * avg(avg_over_time(kube_persistentvolume_capacity_bytes[$timeRange] offset 1m)) by (persistentvolume) / 1024 / 1024 / 1024\n) +\nsum(avg(container_fs_limit_bytes{device!=\"tmpfs\", id=\"/\"}) by (instance) / 1024 / 1024 / 1024) * $localStorageGBCost",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "storage cost",
            "refId": "A"
          }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeShift": null,
        "title": "Storage Cost",
        "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
        "type": "graph",
        "xaxis": {
          "buckets": null,
          "mode": "time",
          "name": null,
          "show": true,
          "values": []
        },
        "yaxes": [
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": "0",
            "show": true
          },
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
          }
        ],
        "yaxis": { "align": false, "alignLevel": null }
      },
      {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": "default-kubecost",
        "description": "Sum of compute, memory, and storage costs",
        "fill": 1,
        "gridPos": { "h": 7, "w": 6, "x": 18, "y": 9 },
        "id": 22,
        "interval": "1m",
        "legend": {
          "avg": false,
          "current": false,
          "max": false,
          "min": false,
          "show": false,
          "total": false,
          "values": false
        },
        "lines": true,
        "linewidth": 1,
        "links": [],
        "nullPointMode": "null",
        "percentage": false,
        "pointradius": 5,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
          {
            "expr": "# Compute\nsum(\n  avg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100) +\n  avg(node_gpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n) +\n\n\n# Memory\nsum(\n  avg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n) +\n\n# Storage \n\nsum(avg(pv_hourly_cost) by (persistentvolume) * 730 * avg(kube_persistentvolume_capacity_bytes) by (persistentvolume) / 1024 / 1024 / 1024) \n+\nsum(sum(container_fs_limit_bytes{device!=\"tmpfs\", id=\"/\"}) by (instance) / 1024 / 1024 / 1024) * $localStorageGBCost",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "total cost",
            "refId": "A"
          }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeShift": null,
        "title": "Total Cost",
        "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
        "type": "graph",
        "xaxis": {
          "buckets": null,
          "mode": "time",
          "name": null,
          "show": true,
          "values": []
        },
        "yaxes": [
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": "0",
            "show": true
          },
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
          }
        ],
        "yaxis": { "align": false, "alignLevel": null }
      },
      {
        "columns": [],
        "datasource": "default-kubecost",
        "description": "Cost of by resource class of currently provisioned nodes",
        "fontSize": "100%",
        "gridPos": { "h": 9, "w": 12, "x": 0, "y": 16 },
        "id": 8,
        "links": [],
        "pageSize": null,
        "scroll": true,
        "showHeader": true,
        "sort": { "col": 4, "desc": false },
        "styles": [
          {
            "alias": "",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "Time",
            "thresholds": [],
            "type": "hidden",
            "unit": "short"
          },
          {
            "alias": "Compute Cost",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "Value",
            "thresholds": [],
            "type": "number",
            "unit": "short"
          },
          {
            "alias": "CPU Cost",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "Value #A",
            "thresholds": [],
            "type": "number",
            "unit": "currencyUSD"
          },
          {
            "alias": "Mem Cost",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "Value #B",
            "thresholds": [],
            "type": "number",
            "unit": "currencyUSD"
          },
          {
            "alias": "Total",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "Value #C",
            "thresholds": [],
            "type": "number",
            "unit": "currencyUSD"
          },
          {
            "alias": "",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "instance",
            "thresholds": [],
            "type": "hidden",
            "unit": "short"
          },
          {
            "alias": "GPU",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "dateFormat": "YYYY-MM-DD HH:mm:ss",
            "decimals": 2,
            "mappingType": 1,
            "pattern": "Value #D",
            "thresholds": [],
            "type": "number",
            "unit": "short"
          },
          {
            "alias": "",
            "colorMode": null,
            "colors": [
              "rgba(245, 54, 54, 0.9)",
              "rgba(237, 129, 40, 0.89)",
              "rgba(50, 172, 45, 0.97)"
            ],
            "decimals": 2,
            "pattern": "/.*/",
            "thresholds": [],
            "type": "number",
            "unit": "short"
          }
        ],
        "targets": [
          {
            "expr": "avg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost or up * 0) by (node) * 730 * (1-$useDiscount/100)",
            "format": "table",
            "instant": true,
            "intervalFactor": 1,
            "legendFormat": "",
            "refId": "A"
          },
          {
            "expr": "avg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730 * (1-$useDiscount/100)",
            "format": "table",
            "instant": true,
            "intervalFactor": 1,
            "legendFormat": "",
            "refId": "B"
          },
          {
            "expr": "avg(node_gpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100)",
            "format": "table",
            "instant": true,
            "intervalFactor": 1,
            "refId": "D"
          },
          {
            "expr": "# CPU  \navg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost or up * 0) by (node) * 730 * (1-$useDiscount/100) +\n# GPU\navg(node_gpu_hourly_cost) by (node) * 730 * (1-$useDiscount/100) +\n# Memory\navg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730 * (1-$useDiscount/100)\n",
            "format": "table",
            "instant": true,
            "intervalFactor": 1,
            "refId": "C"
          }
        ],
        "title": "Cost by node",
        "transform": "table",
        "type": "table"
      },
      {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": "default-kubecost",
        "description": "Monthly run rate of attached disk + PV storage costs based on currently provisioned resources.",
        "fill": 1,
        "gridPos": { "h": 9, "w": 12, "x": 12, "y": 16 },
        "id": 25,
        "interval": "1m",
        "legend": {
          "avg": false,
          "current": false,
          "max": false,
          "min": false,
          "show": true,
          "total": false,
          "values": false
        },
        "lines": true,
        "linewidth": 1,
        "links": [],
        "nullPointMode": "connected",
        "percentage": false,
        "pointradius": 5,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": true,
        "steppedLine": false,
        "targets": [
          {
            "expr": "sum(\n  avg(kube_node_status_capacity_cpu_cores) by (node) * avg(node_cpu_hourly_cost) by (node) * 730 +\n  avg(node_gpu_hourly_cost) by (node) * 730\n)",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "cpu",
            "refId": "B"
          },
          {
            "expr": "sum(\n  avg(kube_node_status_capacity_memory_bytes) by (node) / 1024 / 1024 / 1024 * avg(node_ram_hourly_cost) by (node) * 730\n)",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "memory",
            "refId": "A"
          },
          {
            "expr": "sum(\n  avg(avg_over_time(pv_hourly_cost[$timeRange] offset 1m)) by (persistentvolume) * 730 \n  * avg(avg_over_time(kube_persistentvolume_capacity_bytes[$timeRange] offset 1m)) by (persistentvolume) / 1024 / 1024 / 1024\n) +\nsum(avg(container_fs_limit_bytes{device!=\"tmpfs\", id=\"/\"}) by (instance) / 1024 / 1024 / 1024) * $localStorageGBCost",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "storage",
            "refId": "C"
          },
          {
            "expr": "SUM(rate(node_network_transmit_bytes_total{device=\"eth0\"}[60m]) / 1024 / 1024 / 1024 ) * (60 * 60 * 24 * 30) * $percentEgress * $egressCost ",
            "format": "time_series",
            "intervalFactor": 1,
            "legendFormat": "network",
            "refId": "D"
          }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeShift": null,
        "title": "Cost by Resource",
        "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
        "type": "graph",
        "xaxis": {
          "buckets": null,
          "mode": "time",
          "name": null,
          "show": true,
          "values": []
        },
        "yaxes": [
          {
            "format": "currencyUSD",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": "0",
            "show": true
          },
          {
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
          }
        ],
        "yaxis": { "align": false, "alignLevel": null }
      }
    ],
    "refresh": false,
    "schemaVersion": 16,
    "style": "dark",
    "tags": ["cost", "utilization", "metrics"],
    "templating": {
      "list": [
        {
          "auto": true,
          "auto_count": 1,
          "auto_min": "1m",
          "current": { "text": "auto", "value": "$__auto_interval_timeRange" },
          "hide": 2,
          "label": null,
          "name": "timeRange",
          "options": [
            {
              "selected": true,
              "text": "auto",
              "value": "$__auto_interval_timeRange"
            },
            { "selected": false, "text": "1h", "value": "1h" },
            { "selected": false, "text": "6h", "value": "6h" },
            { "selected": false, "text": "12h", "value": "12h" },
            { "selected": false, "text": "1d", "value": "1d" },
            { "selected": false, "text": "7d", "value": "7d" },
            { "selected": false, "text": "14d", "value": "14d" },
            { "selected": false, "text": "30d", "value": "30d" },
            { "selected": false, "text": "90d", "value": "90d" }
          ],
          "query": "1h,6h,12h,1d,7d,14d,30d,90d",
          "refresh": 2,
          "skipUrlSync": false,
          "type": "interval"
        },
        {
          "current": { "text": ".04", "value": ".04" },
          "hide": 2,
          "label": "Cost per Gb hour for attached disks",
          "name": "localStorageGBCost",
          "options": [{ "selected": true, "text": ".04", "value": ".04" }],
          "query": ".04",
          "skipUrlSync": false,
          "type": "constant"
        },
        {
          "current": { "tags": [], "text": "0", "value": "0" },
          "hide": 0,
          "label": "Sustained Use Discount %",
          "name": "useDiscount",
          "options": [{ "selected": true, "text": "0", "value": "0" }],
          "query": "0",
          "skipUrlSync": false,
          "type": "constant"
        },
        {
          "current": { "text": ".1", "value": ".1" },
          "hide": 2,
          "label": null,
          "name": "percentEgress",
          "options": [{ "selected": true, "text": ".1", "value": ".1" }],
          "query": ".1",
          "skipUrlSync": false,
          "type": "constant"
        },
        {
          "current": { "text": ".12", "value": ".12" },
          "hide": 2,
          "label": null,
          "name": "egressCost",
          "options": [{ "selected": true, "text": ".12", "value": ".12" }],
          "query": ".12",
          "skipUrlSync": false,
          "type": "constant"
        }
      ]
    },
    "time": { "from": "now-7d", "to": "now" },
    "timepicker": {
      "refresh_intervals": [
        "5s",
        "10s",
        "30s",
        "1m",
        "5m",
        "15m",
        "30m",
        "1h",
        "2h",
        "1d"
      ],
      "time_options": ["5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d"]
    },
    "timezone": "",
    "title": "Kubecost cluster metrics",
    "uid": "JOUdHGZZz",
    "version": 1
  }
}
dwbrown2 commented 3 years ago

We're you able to view one of these graphs and see what response you are getting from Prometheus? This will be available in the Query Inspector. Here's what this look like:

image

munjalpatel commented 3 years ago

Thanks for your help @dwbrown2 . I resolved issue. My node was in a memory pressure and hence it was having troubles processing requests.