thanos-io / kube-thanos

Kubernetes specific configuration for deploying Thanos.
Apache License 2.0
522 stars 176 forks source link

Not all VMs are equal. Support for kubernetes tolerations and nodeAffinity #233

Closed ahysing closed 3 years ago

ahysing commented 3 years ago

I work in an organisation where we are heavy users of kubernetes running on Microsoft Azure AKS . Thanos and kube-thanos has worked out great for us. However thanos requires more memory than what we have on ordinary application servers. The solution is to schedule thanos to run on a different node pool with more memory than normal applications. To achieve this one could use a combination of two features in kubernetes; Taints and Tolerations and Node Affinity .

In the current version of kube-thanos these two fields are not configurable. I hope to contribute to the community a pull request where these two sections can be set up with jsonnet-bundler.

The end result should contain tolerations to all objects of kind: Deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.19.0
    spec:
      tolerations: 
        - effect: NoSchedule
          key: CriticalAddonsOnly
          operator: Equal
          value: "true"
...

The end result should also contain nodeAffinity to all objects of kind: Deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.19.0
    spec:
      ...
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - systempool
        podAntiAffinity:
        ...

A working solution should build on standard kubernetes configuration, and be generic enough to fit into a similar setup on all major cloud providers.

There might be other ways to achieve the same result on Azure Kubernetes Services. To run thanos on dedicated hardware. My proposal might not be the only good solution.

ahysing commented 3 years ago

My employer really needs this feature, and we are willing to put in the time and effort to build and maintain it. I hope we can get some attention on this issue. Access to commit feature branches is highly appreciated.

ahysing commented 3 years ago

After getting no attention here. We found a different way to do the same. Now we are able to get taints and tolerations in via kustomize.

Closing this