Helm values nodeSelector does not schedule Pods to Nodes (v1.88.1)

BnJam commented 2 years ago

Describe the bug

K8s: 1.21

When deploying via Helm and trying to place pods to user selected nodes by editing the values.yaml file, the .nodeSelector: term is ignored or causes no pods to be scheduled at all. Except for Grafana. This is happening with both the global nodeSelector/tolerations and also attempting to set it for specific pods; ie: networkCosts.nodeSelector/networkCosts.tolerations (networkCosts.enabled=true).

Deploying via Helm:

helm upgrade -i kubecost kubecost/cost-analyzer \
    --namespace kubecost \
    --create-namespace \
    --values values.yaml \
    --version v1.88.1

To Reproduce Steps to reproduce the behavior:

Edit the values.yaml file nodeSelector and tolerations
redeploy chart
observe no Pods being scheduled or stuck in "Pending" (this happens for the cost-analyzer Pod).

Expected behavior The expected behaviour is that the Pods are scheduled to the appropriate node or at the very least show in a Pending status.

gz#1172

AjayTripathy commented 2 years ago

Hi @BnJam could you share the values.yaml that you're using for the nodeselector?

BnJam commented 2 years ago

Here is the yaml I am using with the labels renamed - currently all the nodeSelector configs are commented out.

values.yaml

```yaml global: # zone: cluster.local (use only if your DNS server doesn't live in the same zone as kubecost) prometheus: enabled: true # If false, Prometheus will not be installed -- only actively supported on paid Kubecost plans fqdn: http://cost-analyzer-prometheus-server.default.svc #example address of a prometheus to connect to. Include protocol (http:// or https://) Ignored if enabled: true # insecureSkipVerify : false # If true, kubecost will not check the TLS cert of prometheus # queryServiceBasicAuthSecretName: dbsecret # kubectl create secret generic dbsecret -n kubecost --from-file=USERNAME --from-file=PASSWORD # queryServiceBearerTokenSecretName: dbsecret # kubectl create secret generic mcdbsecret -n kubecost --from-file=TOKEN # Durable storage option, product key required thanos: enabled: false # queryService: http://kubecost-thanos-query-frontend-http.kubecost:{{ .Values.thanos.queryFrontend.http.port }} # an address of the thanos query-frontend endpoint, if different from installed thanos # queryServiceBasicAuthSecretName: mcdbsecret # kubectl create secret generic mcdbsecret -n kubecost --from-file=USERNAME --from-file=PASSWORD <---enter basic auth credentials like that # queryServiceBearerTokenSecretName mcdbsecret # kubectl create secret generic mcdbsecret -n kubecost --from-file=TOKEN # queryOffset: 3h # The offset to apply to all thanos queries in order to achieve syncronization on all cluster block stores grafana: enabled: true # If false, Grafana will not be installed domainName: cost-analyzer-grafana.default.svc #example grafana domain Ignored if enabled: true scheme: "http" # http or https, for the domain name above. proxy: true # If true, the kubecost frontend will route to your grafana through its service endpoint notifications: # Kubecost alerting configuration # Ref: http://docs.kubecost.com/alerts # alertConfigs: # frontendUrl: http://localhost:9090 # optional, used for linkbacks # globalSlackWebhookUrl: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX # optional, used for Slack alerts # globalAlertEmails: # - recipient@example.com # - additionalRecipient@example.com # Alerts generated by kubecost, about cluster data # alerts: # Daily namespace budget alert on namespace `kubecost` # - type: budget # supported: budget, recurringUpdate # threshold: 50 # optional, required for budget alerts # window: daily # or 1d # aggregation: namespace # filter: kubecost # ownerContact: # optional, overrides globalAlertEmails default # - owner@example.com # - owner2@example.com # # optional, used for alert-specific Slack alerts # slackWebhookUrl: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX # Daily cluster budget alert on cluster `cluster-one` # - type: budget # threshold: 200.8 # optional, required for budget alerts # window: daily # or 1d # aggregation: cluster # filter: cluster-one # does not accept csv # Recurring weekly update (weeklyUpdate alert) # - type: recurringUpdate # window: weekly # or 7d # aggregation: namespace # filter: '*' # Recurring weekly namespace update on kubecost namespace # - type: recurringUpdate # window: weekly # or 7d # aggregation: namespace # filter: kubecost # Spend Change Alert # - type: spendChange # change relative to moving avg # relativeThreshold: 0.20 # Proportional change relative to baseline. Must be greater than -1 (can be negative) # window: 1d # accepts ‘d’, ‘h’ # baselineWindow: 30d # previous window, offset by window # aggregation: namespace # filter: kubecost, default # accepts csv # Health Score Alert # - type: health # Alerts when health score changes by a threshold # window: 10m # threshold: 5 # Send Alert if health scores changes by 5 or more # Kubecost Health Diagnostic # - type: diagnostic # Alerts when kubecost is is unable to compute costs - ie: Prometheus unreachable # window: 10m alertmanager: # Supply an alertmanager FQDN to receive notifications from the app. enabled: false # If true, allow kubecost to write to your alertmanager fqdn: http://cost-analyzer-prometheus-server.default.svc #example fqdn. Ignored if prometheus.enabled: true # Set saved report(s) accessible from reports.html # Ref: http://docs.kubecost.com/saved-reports savedReports: enabled: false # If true, overwrites report parameters set through UI reports: - title: "Example Saved Report 0" window: "today" aggregateBy: "namespace" idle: "separate" accumulate: false # daily resolution filters: - property: "cluster" value: "cluster-one,cluster*" # supports wildcard filtering and multiple comma separated values - property: "namespace" value: "kubecost" - title: "Example Saved Report 1" window: "month" aggregateBy: "controllerKind" idle: "share" accumulate: false filters: - property: "label" value: "app:cost*,environment:kube*" - property: "namespace" value: "kubecost" - title: "Example Saved Report 2" window: "2020-11-11T00:00:00Z,2020-12-09T23:59:59Z" aggregateBy: "service" idle: "hide" accumulate: true # entire window resolution filters: [] # if no filters, specify empty array # Set saved report(s) accessible from reports.html # Ref: http://docs.kubecost.com/saved-reports assetReports: enabled: false # If true, overwrites report parameters set through UI reports: - title: "Example Asset Report 0" window: "today" aggregateBy: "type" accumulate: false # daily resolution filters: - property: "cluster" value: "cluster-one" podAnnotations: {} # iam.amazonaws.com/role: role-arn additionalLabels: {} # generated at http://kubecost.com/install, used for alerts tracking and free trials kubecostToken: "..." # Advanced pipeline for custom prices, enterprise key required pricingCsv: enabled: false location: provider: "AWS" region: "us-east-1" URI: s3://kc-csv-test/pricing_schema.csv # a valid file URI csvAccessCredentials: pricing-schema-access-secret # SAML integration for user management and RBAC, enterprise key required # Ref: https://github.com/kubecost/docs/blob/master/user-management.md saml: enabled: false secretName: "kubecost-authzero" #metadataSecretName: "kubecost-authzero-metadata" # One of metadataSecretName or idpMetadataURL must be set. defaults to metadataURL if set idpMetadataURL: "https://dev-elu2z98r.auth0.com/samlp/metadata/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2" appRootURL: "http://localhost:9090" # sample URL # audienceURI: "http://localhost:9090" # by convention, the same as the appRootURL, but any string uniquely identifying kubecost to your samp IDP. Optional if you follow the convention # nameIDFormat: "urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified" If your SAML provider requires a specific nameid format rbac: enabled: false groups: - name: admin enabled: false # if admin is disabled, all SAML users will be able to make configuration changes to the kubecost frontend assertionName: "http://schemas.auth0.com/userType" # a SAML Assertion, one of whose elements has a value that matches on of the values in assertionValues assertionValues: - "admin" - "superusers" - name: readonly enabled: false # if readonly is disabled, all users authorized on SAML will default to readonly assertionName: "http://schemas.auth0.com/userType" assertionvalues: - "readonly" # Adds an httpProxy as an environment variable. systemProxy.enabled must be `true`to have any effect. # Ref: https://www.oreilly.com/library/view/security-with-go/9781788627917/5ea6a02b-3d96-44b1-ad3c-6ab60fcbbe4f.xhtml systemProxy: enabled: false httpProxyUrl: "" httpsProxyUrl: "" noProxy: "" # imagePullSecrets: # - name: "image-pull-secret" kubecostFrontend: image: "gcr.io/kubecost1/frontend" imagePullPolicy: Always resources: requests: cpu: "10m" memory: "55Mi" #limits: # cpu: "100m" # memory: "256Mi" # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" kubecost: # Enables the cost-analyzer-server container to spin up as part of kubecost # Setting this to false will mean all /api/ endpoints will be unavailable disableServer: true image: "gcr.io/kubecost1/server" resources: requests: cpu: "100m" memory: "55Mi" limits: cpu: "100m" memory: "256Mi" # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" kubecostModel: image: "gcr.io/kubecost1/cost-model" imagePullPolicy: Always # Enables the emission of the kubecost_cloud_credit_total and # kubecost_cloud_expense_total metrics outOfClusterPromMetricsEnabled: false # Build local cost allocation cache warmCache: false # Build local savings cache warmSavingsCache: true # Run allocation ETL pipelines etl: true # The total number of days the ETL pipelines will build # Set to 0 to disable daily ETL (not recommended) etlDailyStoreDurationDays: 91 # The total number of hours the ETL pipelines will build # Set to 0 to disable hourly ETL (not recommended) etlHourlyStoreDurationHours: 49 # max number of concurrent Prometheus queries maxQueryConcurrency: 5 resources: requests: cpu: "200m" memory: "55Mi" #limits: # cpu: "800m" # memory: "256Mi" # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" # Basic Kubecost ingress, more examples available at https://github.com/kubecost/docs/blob/master/ingress-examples.md ingress: enabled: false className: nginx annotations: # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: "true" paths: ["/"] # There's no need to route specifically to the pods-- we have an nginx deployed that handles routing pathType: ImplementationSpecific hosts: - cost-analyzer.local tls: [] # - secretName: cost-analyzer-tls # hosts: # - cost-analyzer.local # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" # affinity: {} # If true, creates a PriorityClass to be used by the cost-analyzer pod priority: enabled: false # value: 1000000 # If true, enable creation of NetworkPolicy resources. networkPolicy: enabled: false denyEgress: true # create a network policy that denies egress from kubecost sameNamespace: true # Set to true if cost analyser and prometheus are on the same namespace # namespace: kubecost # Namespace where prometheus is installed # Cost-analyzer specific vars using the new template costAnalyzer: enabled: false # If true, create a newtork policy for cost-analzyer annotations: {} # annotations to be added to the network policy additionalLabels: {} # additional labels to be added to the network policy # Examples rules: # ingressRules: # - selectors: # allow ingress from self on all ports # - podSelector: # matchLabels: # app.kubernetes.io/name: cost-analyzer # - selectors: # allow egress access to prometheus # - namespaceSelector: # matchLabels: # name: prometheus # podSelector: # matchLabels: # app: prometheus # ports: # - protocol: TCP # port: 9090 # egressRules: # - selectors: # restrict egress to inside cluster # - namespaceSelector: {} podSecurityPolicy: enabled: true ## @param extraVolumes A list of volumes to be added to the pod ## # extraVolumes: [] ## @param extraVolumeMounts A list of volume mounts to be added to the pod ## # extraVolumeMounts: [] # Define persistence volume for cost-analyzer, more information at https://github.com/kubecost/docs/blob/master/storage.md persistentVolume: size: 32Gi dbSize: 32.0Gi enabled: true # Note that setting this to false means configurations will be wiped out on pod restart. # storageClass: "-" # # existingClaim: kubecost-cost-analyzer # a claim in the same namespace as kubecost service: type: ClusterIP port: 9090 targetPort: 9090 # nodePort: labels: {} annotations: {} # Enabling long-term durable storage with Postgres requires an enterprise license remoteWrite: postgres: enabled: false initImage: "gcr.io/kubecost1/sql-init" initImagePullPolicy: Always installLocal: true remotePostgresAddress: "" # ignored if installing locally persistentVolume: size: 200Gi auth: password: admin # change me prometheus: extraScrapeConfigs: | - job_name: kubecost honor_labels: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: http dns_sd_configs: - names: - {{ template "cost-analyzer.serviceName" . }} type: 'A' port: 9003 - job_name: kubecost-networking kubernetes_sd_configs: - role: pod relabel_configs: # Scrape only the the targets matching the following metadata - source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: {{ template "cost-analyzer.networkCostsName" . }} server: # If clusterIDConfigmap is defined, instead use user-generated configmap with key CLUSTER_ID # to use as unique cluster ID in kubecost cost-analyzer deployment. # This overrides the cluster_id set in prometheus.server.global.external_labels. # NOTE: This does not affect the external_labels set in prometheus config. # clusterIDConfigmap: cluster-id-configmap resources: limits: cpu: 500m memory: 512Mi requests: cpu: 500m memory: 512Mi global: scrape_interval: 1m scrape_timeout: 10s evaluation_interval: 1m external_labels: cluster_id: cluster-one # Each cluster should have a unique ID persistentVolume: size: 32Gi enabled: true extraArgs: query.max-concurrency: 1 query.max-samples: 100000000 # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" alertmanager: enabled: false persistentVolume: enabled: true nodeExporter: enabled: true pushgateway: enabled: false persistentVolume: enabled: true serverFiles: # prometheus.yml: # Sample block -- enable if using an in cluster durable store. # remote_write: # - url: "http://pgprometheus-adapter:9201/write" # write_relabel_configs: # - source_labels: [__name__] # regex: 'container_.*_allocation|container_.*_allocation_bytes|.*_hourly_cost|kube_pod_container_resource_requests{resource="memory", unit="byte"}|container_memory_working_set_bytes|kube_pod_container_resource_requests{resource="cpu", unit="core"}|kube_pod_container_resource_requests|pod_pvc_allocation|kube_namespace_labels|kube_pod_labels' # action: keep # queue_config: # max_samples_per_send: 1000 #remote_read: # - url: "http://pgprometheus-adapter:9201/read" rules: groups: - name: CPU rules: - expr: sum(rate(container_cpu_usage_seconds_total{container_name!=""}[5m])) record: cluster:cpu_usage:rate5m - expr: rate(container_cpu_usage_seconds_total{container_name!=""}[5m]) record: cluster:cpu_usage_nosum:rate5m - expr: avg(irate(container_cpu_usage_seconds_total{container_name!="POD", container_name!=""}[5m])) by (container_name,pod_name,namespace) record: kubecost_container_cpu_usage_irate - expr: sum(container_memory_working_set_bytes{container_name!="POD",container_name!=""}) by (container_name,pod_name,namespace) record: kubecost_container_memory_working_set_bytes - expr: sum(container_memory_working_set_bytes{container_name!="POD",container_name!=""}) record: kubecost_cluster_memory_working_set_bytes - name: Savings rules: - expr: sum(avg(kube_pod_owner{owner_kind!="DaemonSet"}) by (pod) * sum(container_cpu_allocation) by (pod)) record: kubecost_savings_cpu_allocation labels: daemonset: "false" - expr: sum(avg(kube_pod_owner{owner_kind="DaemonSet"}) by (pod) * sum(container_cpu_allocation) by (pod)) / sum(kube_node_info) record: kubecost_savings_cpu_allocation labels: daemonset: "true" - expr: sum(avg(kube_pod_owner{owner_kind!="DaemonSet"}) by (pod) * sum(container_memory_allocation_bytes) by (pod)) record: kubecost_savings_memory_allocation_bytes labels: daemonset: "false" - expr: sum(avg(kube_pod_owner{owner_kind="DaemonSet"}) by (pod) * sum(container_memory_allocation_bytes) by (pod)) / sum(kube_node_info) record: kubecost_savings_memory_allocation_bytes labels: daemonset: "true" - expr: label_replace(sum(kube_pod_status_phase{phase="Running",namespace!="kube-system"} > 0) by (pod, namespace), "pod_name", "$1", "pod", "(.+)") record: kubecost_savings_running_pods - expr: sum(rate(container_cpu_usage_seconds_total{container_name!="",container_name!="POD",instance!=""}[5m])) by (namespace, pod_name, container_name, instance) record: kubecost_savings_container_cpu_usage_seconds - expr: sum(container_memory_working_set_bytes{container_name!="",container_name!="POD",instance!=""}) by (namespace, pod_name, container_name, instance) record: kubecost_savings_container_memory_usage_bytes - expr: avg(sum(kube_pod_container_resource_requests{resource="cpu", unit="core", namespace!="kube-system"}) by (pod, namespace, instance)) by (pod, namespace) record: kubecost_savings_pod_requests_cpu_cores - expr: avg(sum(kube_pod_container_resource_requests{resource="memory", unit="byte", namespace!="kube-system"}) by (pod, namespace, instance)) by (pod, namespace) record: kubecost_savings_pod_requests_memory_bytes ## Module for measuring network costs ## Ref: https://github.com/kubecost/docs/blob/master/network-allocation.md networkCosts: enabled: true podSecurityPolicy: enabled: false image: gcr.io/kubecost1/kubecost-network-costs:v15.7 imagePullPolicy: Always # For existing Prometheus Installs, create a Service which generates Endpoints for each of the network-costs pods. # This Service is annotated with prometheus.io/scrape: "true" to automatically get picked up by the prometheus config. # NOTE: Setting this option to true and leaving the above extraScrapeConfig "job_name: kubecost-networking" configured will cause the # NOTE: pods to be scraped twice. prometheusScrape: true # Traffic Logging will enable logging the top 5 destinations for each source # every 30 minutes. trafficLogging: true # Port will set both the containerPort and hostPort to this value. # These must be identical due to network-costs being run on hostNetwork port: 3001 resources: requests: cpu: "50m" memory: "20Mi" config: # # Configuration for traffic destinations, including specific classification # # for IPs and CIDR blocks. This configuration will act as an override to the # # automatic classification provided by network-costs. # destinations: # # In Zone contains a list of address/range that will be # # classified as in zone. # in-zone: # # Loopback # - "127.0.0.1" # # IPv4 Link Local Address Space # - "169.254.0.0/16" # # Private Address Ranges in RFC-1918 # - "10.0.0.0/8" # Remove this entry if using Multi-AZ Kubernetes # - "172.16.0.0/12" # - "192.168.0.0/16" # In Region contains a list of address/range that will be # classified as in region. This is synonymous with cross # zone traffic, where the regions between source and destinations # are the same, but the zone is different. in-region: [] # Cross Region contains a list of address/range that will be # classified as non-internet egress from one region to another. cross-region: [] # Direct Classification specifically maps an ip address or range # to a region (required) and/or zone (optional). This classification # takes priority over in-zone, in-region, and cross-region configurations. direct-classification: [] # - region: "us-east1" # zone: "us-east1-c" # ips: # - "10.0.0.0/24" ## Node tolerations for server scheduling to nodes with taints ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ ## # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" # affinity: {} ## PriorityClassName ## Ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass # priorityClassName: [] ## PodMonitor ## Allows scraping of network metrics from a dedicated prometheus operator setup podMonitor: enabled: false additionalLabels: kubecost-networking: enabled additionalLabels: {} # nodeSelector: {} # annotations: {} # Kubecost Deployment Configuration # Used for HA mode in Business & Enterprise tier kubecostDeployment: replicas: 1 # Kubecost Cluster Controller for Right Sizing and Cluster Turndown clusterController: enabled: false image: gcr.io/kubecost1/cluster-controller:v0.0.2 imagePullPolicy: Always reporting: # Kubecost bug report feature: Logs access/collection limited to .Release.Namespace # Ref: http://docs.kubecost.com/bug-report logCollection: true # Basic frontend analytics productAnalytics: true # Report Javascript errors errorReporting: true valuesReporting: true serviceMonitor: enabled: false additionalLabels: {} prometheusRule: enabled: false additionalLabels: {} supportNFS: false # initChownDataImage ensures all Kubecost filepath permissions on PV or local storage are set up correctly. initChownDataImage: "busybox" # Supports a fully qualified Docker image, e.g. registry.hub.docker.com/library/busybox:latest initChownData: resources: {} #requests: # cpu: "50m" # memory: "20Mi" grafana: # nodeSelector: # node_label: node_label_match # tolerations: # - key: node_key # operator: "Equal" # value: "key_value" # effect: "NoSchedule" # namespace_datasources: kubecost # override the default namespace here # namespace_dashboards: kubecost # override the default namespace here sidecar: dashboards: enabled: true # label that the configmaps with dashboards are marked with label: grafana_dashboard # set sidecar ERROR_THROTTLE_SLEEP env var from default 5s to 0s -> fixes https://github.com/kubecost/cost-analyzer-helm-chart/issues/877 annotations: {} error_throttle_sleep: 0 datasources: # dataSourceFilename: foo.yml # If you need to change the name of the datasource file enabled: true defaultDatasourceEnabled: false dataSourceName: prometheus-kubecost # label that the configmaps with datasources are marked with label: kubecost_grafana_datasource # set sidecar ERROR_THROTTLE_SLEEP env var from default 5s to 0s -> fixes https://github.com/kubecost/cost-analyzer-helm-chart/issues/877 error_throttle_sleep: 0 # For grafana to be accessible, add the path to root_url. For example, if you run kubecost at www.foo.com:9090/kubecost # set root_url to "%(protocol)s://%(domain)s:%(http_port)s/kubecost/grafana". No change is necessary here if kubecost runs at a root URL grafana.ini: server: root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana" serviceAccount: create: true # Set this to false if you're bringing your own service account. annotations: {} # name: kc-test awsstore: useAwsStore: false createServiceAccount: false kubecostMetrics: emitPodAnnotations: "true" emitNamespaceAnnotations: "true" # emitKsmV1Metrics: true # emit all KSM metrics in KSM v1. # emitKsmV1MetricsOnly: false # emit only the KSM metrics missing from KSM v2. Advanced users only. # readonly: false # disable updates to kubecost from the frontend UI and via POST request # These configs can also be set from the Settings page in the Kubecost product UI # Values in this block override config changes in the Settings UI on pod restart # kubecostProductConfigs: # An optional list of cluster definitions that can be added for frontend access. The local # cluster is *always* included by default, so this list is for non-local clusters. # Ref: https://github.com/kubecost/docs/blob/master/multi-cluster.md # clusters: # - name: "Cluster A" # address: http://cluster-a.kubecost.com:9090 # # Optional authentication credentials - only basic auth is currently supported. # auth: # type: basic # # Secret name should be a secret formatted based on: https://github.com/kubecost/docs/blob/master/ingress-examples.md # secretName: cluster-a-auth # # Or pass auth directly as base64 encoded user:pass # data: YWRtaW46YWRtaW4= # # Or user and pass directly # user: admin # pass: admin # - name: "Cluster B" # address: http://cluster-b.kubecost.com:9090 # defaultModelPricing: # default monthly resource prices, used predominately for on-prem clusters # CPU: 28.0 # spotCPU: 4.86 # RAM: 3.09 # spotRAM: 0.65 # GPU: 693.50 # spotGPU: 225.0 # storage: 0.04 # zoneNetworkEgress: 0.01 # regionNetworkEgress: 0.01 # internetNetworkEgress: 0.12 # enabled: true # # The cluster profile represents a predefined set of parameters to use when calculating savings. # # Possible values are: [ development, production, high-availability ] # clusterProfile: production # customPricesEnabled: false # This makes the default view custom prices-- generally used for on-premises clusters # spotLabel: lifecycle # spotLabelValue: Ec2Spot # gpuLabel: gpu # gpuLabelValue: true # awsServiceKeyName: ACCESSKEYID # awsServiceKeyPassword: fakepassword # Only use if your values.yaml are stored encrypted. Otherwise provide an existing secret via serviceKeySecretName # awsSpotDataRegion: us-east-1 # awsSpotDataBucket: spot-data-feed-s3-bucket # awsSpotDataPrefix: dev # athenaProjectID: "530337586277" # The AWS AccountID where the Athena CUR is. Generally your masterpayer account # athenaBucketName: "s3://aws-athena-query-results-530337586277-us-east-1" # athenaRegion: us-east-1 # athenaDatabase: athenacurcfn_athena_test1 # athenaTable: "athena_test1" # masterPayerARN: "" # projectID: "123456789" # Also known as AccountID on AWS -- the current account/project that this instance of Kubecost is deployed on. # gcpSecretName: gcp-secret # Name of a secret representing the gcp service key # bigQueryBillingDataDataset: billing_data.gcp_billing_export_v1_01AC9F_74CF1D_5565A2 labelMappingConfigs: # names of k8s labels used to designate different allocation concepts enabled: true # owner_label: "owner" # team_label: "team" # department_label: "dept" # product_label: "product" # environment_label: "env" # namespace_external_label: "kubernetes_namespace" # external labels are used to map external cloud costs to kubernetes concepts # cluster_external_label: "kubernetes_cluster" # controller_external_label: "kubernetes_controller" # product_external_label: "kubernetes_label_app" # service_external_label: "kubernetes_service" # deployment_external_label: "kubernetes_deployment" # owner_external_label: "kubernetes_label_owner" # team_external_label: "kubernetes_label_team" # environment_external_label: "kubernetes_label_env" # department_external_label: "kubernetes_label_department" # statefulset_external_label: "kubernetes_statefulset" # daemonset_external_label: "kubernetes_daemonset" # pod_external_label: "kubernetes_pod" # grafanaURL: "" # clusterName: "" # used for display in Kubecost UI # currencyCode: "USD" # offical support for USD, CAD, EUR, and CHF # azureBillingRegion: US # Represents 2-letter region code, e.g. West Europe = NL, Canada = CA. ref: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes # azureSubscriptionID: 0bd50fdf-c923-4e1e-850c-196dd3dcc5d3 # azureClientID: f2ef6f7d-71fb-47c8-b766-8d63a19db017 # azureTenantID: 72faf3ff-7a3f-4597-b0d9-7b0b201bb23a # azureClientPassword: fake key # Only use if your values.yaml are stored encrypted. Otherwise provide an existing secret via serviceKeySecretName # azureStorageAccount: "kubecoststore" # Name of Azure Storage account where cost report is being saved # azureStorageAccessKey: "" # Azure Storage account access key found in Azure Storage portal for account where cost report is being saved # azureStorageContainer: "test" # Name of daily month-to-date cost report for Azure # azureStorageSecretName: "azure-storage-config" # Name of Kubernetes Secret where Azure Storage Configuration is stored # azureStorageCreateSecret: true # Create secret for Azure Storage config from values in this file # discount: "" # percentage discount applied to compute # negotiatedDiscount: "" # custom negotiated cloud provider discount # defaultIdle: false # serviceKeySecretName: "" # Use an existing AWS or Azure secret with format as in aws-service-key-secret.yaml or azure-service-key-secret.yaml. Leave blank if using createServiceKeySecret # createServiceKeySecret: true # Creates a secret representing your cloud service key based on data in values.yaml. If you are storing unencrypted values, add a secret manually # sharedNamespaces: "" # namespaces with shared workloads, example value: "kube-system\,ingress-nginx\,kubecost\,monitoring" # sharedOverhead: "" # value representing a fixed external cost per month to be distributed among aggregations. # shareTenancyCosts: true # enable or disable sharing costs such as cluster management fees (defaults to "true" on Settings page) # productKey: # apply business or enterprise product license # key: "" # enabled: false # secretname: productkeysecret # create a secret out of a file named productkey.json of format { "key": "kc-b1325234" } # cloudIntegrationSecret: "cloud-integration" ```

Sean-Holcomb commented 2 years ago

Hey @BnJam, I believe I was able to reproduce the issue. When I redeployed and saw my cost-analyzer pod was pending this was in the events of kubectl describe po <cost-analyzer-pod-name>

I applied the selector label and taint to one node and the error suggests that k8s sees this, but the cost-analyser pv is in a different zone than the node that has the correct label and taint, so it still cannot be scheduled. To fix this delete the cost-analyzer pvc and restart the cost-analyzer pod.

If this is different from the issue that you are encountering, please share the events from the describe command above to help give a better idea of why the pod is not scheduling.

BnJam commented 2 years ago

Y'know, after dealing with this same thing with JupyterHub, it should have been obvious.

That was exactly the issue @Sean-Holcomb - thank you

Everything seems to be scheduling on to the right Nodes except these Pods which stayed on the default nodepool:

kubecost-kube-state-metric-*
kubecost-prometheus-node-exporter-*

BnJam commented 2 years ago

After setting the appropriate values for prometheus.kube-state-metrics.nodeSelector, prometheus.kube-state-metrics.tolerations and setting all the tolerations in prometheus.nodeExporter.tolerations, everything is exactly where I want it to be.

Sean-Holcomb commented 2 years ago

I am glad to hear it worked out!

kubecost / cost-analyzer-helm-chart

Helm values nodeSelector does not schedule Pods to Nodes (v1.88.1) #1180