sentry-kubernetes / charts

Easily deploy Sentry on your Kubernetes Cluster
MIT License
1.09k stars 516 forks source link

sentry-web kubernetes pod distribution #1442

Closed evgenii-avdiukhin closed 2 months ago

evgenii-avdiukhin commented 2 months ago

Issue submitter TODO list

Describe the bug (actual behavior)

I have my sentry deployed in kubernetes using helm chart I have key component replicas increased to 2-3 ( web, worker, cron, relay , etc) I also have pod-anti-affinity cluster policy deployed with kyverno All the pods except sentry-worker, are distributed evenly across worker nodes All 3 sentry-worker pods are scheduled on worker-1 Moreover, whenever i shutdown worker-1, to check if pods evict, sentry-web pods are stuck in pending with event 0/8 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 5 node(s) didn't match pod anti-affinity rules. preemption: 0/8 nodes are available: 3 Preemption is not helpful for scheduling, 5 No preemption victims found for incoming pod. Kyverno policies are cluster wide and are applied the same way for all sentry objects The problem is only with sentry-web deployment, everything else works fine, distributes and evicts as it should Do i miss something here? Been stuck with this issue for couple of days and cant find any solution.

Expected behavior

No response

Your installation details

1.

sentry:
  web:
    enabled: true
    replicas: 3
    podLabels:
      critical: "true"
    labels:
      critical: "true"
  worker:
    enabled: true
    replicas: 3
    podLabels:
      critical: "true"
    labels:
      critical: "true"
    livenessProbe:
      enabled: true
      exec:
        command:
          - sentry
          - exec
          - '-c'
          - >-
            from sentry.celery import app; import os;
            dest="celery@{}".format(os.environ["HOSTNAME"]);
            print(app.control.ping(destination=[dest],
            timeout=10)[0][dest]["ok"])
      initialDelaySeconds: 60
      periodSeconds: 60
      timeoutSeconds: 15
      failureThreshold: 5
      successThreshold: 1
  cron:
    enabled: true
    replicaCount: 3
    podLabels:
      critical: "true"
    labels:
      critical: "true"
relay:
  enabled: true
  replicaCount: 2
  podLabels:
    critical: "true"
  labels:
    critical: "true"

rabbitmq:
  pdb:
    create: false
  enabled: true
  clustering:
    forceBoot: true
    rebalance: true
  replicaCount: 3
  podLabels:
    critical: "true"
  labels:
    critical: "true"
redis:
  enabled: true
  replica:
    replicaCount: 3
  podLabels:
    critical: "true"
  labels:
    critical: "true"
  commonConfiguration: |-
    appendonly no
    save ""
clickhouse:
  enabled: true
  clickhouse:
    replicas: "3"
  podLabels:
    critical: "true"
  labels:
    critical: "true"
kafka:
  podLabels:
    critical: "true"
  labels:
    critical: "true"
postgresql:
  enabled: false
externalPostgresql:
  host: sentry-postgresql-ha-pgpool.sentry.svc.cluster.local
  port: 5432
  username: postgres
  password: secret
  database: postgres
  sslMode: disable
  connMaxAge: 0
user:
  create: true
  email: secret
  password: secret
google:
  clientId: secret
  clientSecret: secret
hooks:
  activeDeadlineSeconds: 1000000

2. chart version: 25.3.0, app version: 24.5.1

Steps to reproduce

deploy sentry with chart having sentry-web replicas increased and see if they schedule on different workers

Screenshots

No response

Logs

No response

Additional context

No response

evgenii-avdiukhin commented 2 months ago

to exclude my cluster policies as reason for incorrect scheduling, i have tried to deploy sentry without any policies and the problem remains, all the pods distribute okay, but all 3 sentry-web replicas are created on worker-1

patsevanton commented 2 months ago

Hello! Could you format issue? Put you code to ```

evgenii-avdiukhin commented 2 months ago

Hello! Could you format issue? Put you code to ```

done

patsevanton commented 2 months ago

Are you try install sentry on clean k8s cluster?

Could you run:

kubectl describe nodes | grep Taints
evgenii-avdiukhin commented 2 months ago

Are you try install sentry on clean k8s cluster?

Could you run:

kubectl describe nodes | grep Taints

the cluster is pretty fresh, it was deployed with kubespray and has all default components + monitoring + jenkins and some minor stuff here are the taints from command you provided above


Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Taints:             <none>
Taints:             <none>
Taints:             <none>
Taints:             <none>
Taints:             <none>
patsevanton commented 2 months ago

Could you run:

kubectl describe deploy -n namespace sentry-worker
evgenii-avdiukhin commented 2 months ago

sure, here you go

kubectl describe deploy -n sentry sentry-worker
Name:                   sentry-worker
Namespace:              sentry
CreationTimestamp:      Tue, 10 Sep 2024 16:41:52 +0400
Labels:                 app=sentry
                        app.kubernetes.io/managed-by=Helm
                        chart=sentry-25.3.0
                        heritage=Helm
                        release=sentry
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: sentry
                        meta.helm.sh/release-namespace: sentry
Selector:               app=sentry,release=sentry,role=worker
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:       app=sentry
                critical=true
                release=sentry
                role=worker
  Annotations:  checksum/config.yaml: 21302c385be962520babe466fe4164f3744f6873c1e19c2ccb92a8b39afdc057
                checksum/configYml: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
                checksum/sentryConfPy: d5f85a6a8afbc55eebe23801e1a51a0fb4c0428c9a73ef6708d8dc83e079cd49
  Containers:
   sentry-worker:
    Image:      getsentry/sentry:24.5.1
    Port:       <none>
    Host Port:  <none>
    Command:
      sentry
    Args:
      run
      worker
    Liveness:  exec [sentry exec -c from sentry.celery import app; import os; dest="celery@{}".format(os.environ["HOSTNAME"]); print(app.control.ping(destination=[dest], timeout=5)[0][dest]["ok"])] delay=10s timeout=15s period=60s #success=1 #failure=5
    Environment:
      C_FORCE_ROOT:       true
      SNUBA:              http://sentry-snuba:1218
      VROOM:              http://sentry-vroom:8085
      SENTRY_SECRET_KEY:  <set to the key 'key' in secret 'sentry-sentry-secret'>  Optional: false
      POSTGRES_PASSWORD:  QrX0KlPh2i
      POSTGRES_USER:      postgres
      POSTGRES_NAME:      postgres
      POSTGRES_HOST:      sentry-postgresql-ha-pgpool.sentry.svc.cluster.local
      POSTGRES_PORT:      5432
    Mounts:
      /etc/sentry from config (ro)
      /var/lib/sentry/files from sentry-data (rw)
  Volumes:
   config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sentry-sentry
    Optional:  false
   sentry-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   sentry-worker-84b585cfbc (3/3 replicas created)
Events:          <none>

i will also provide sentry-web describe in case you mistook, because the problem is actually with web

kubectl describe deploy -n sentry sentry-web   
Name:                   sentry-web
Namespace:              sentry
CreationTimestamp:      Tue, 10 Sep 2024 16:41:52 +0400
Labels:                 app=sentry
                        app.kubernetes.io/managed-by=Helm
                        chart=sentry-25.3.0
                        heritage=Helm
                        release=sentry
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: sentry
                        meta.helm.sh/release-namespace: sentry
Selector:               app=sentry,release=sentry,role=web
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:       app=sentry
                critical=true
                release=sentry
                role=web
  Annotations:  checksum/config.yaml: 21302c385be962520babe466fe4164f3744f6873c1e19c2ccb92a8b39afdc057
                checksum/configYml: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
                checksum/sentryConfPy: d5f85a6a8afbc55eebe23801e1a51a0fb4c0428c9a73ef6708d8dc83e079cd49
  Containers:
   sentry-web:
    Image:      getsentry/sentry:24.5.1
    Port:       9000/TCP
    Host Port:  0/TCP
    Command:
      sentry
      run
      web
    Liveness:   http-get http://:9000/_health/ delay=10s timeout=2s period=10s #success=1 #failure=5
    Readiness:  http-get http://:9000/_health/ delay=10s timeout=2s period=10s #success=1 #failure=5
    Environment:
      SNUBA:              http://sentry-snuba:1218
      VROOM:              http://sentry-vroom:8085
      SENTRY_SECRET_KEY:  <set to the key 'key' in secret 'sentry-sentry-secret'>  Optional: false
      POSTGRES_PASSWORD:  secret
      POSTGRES_USER:      postgres
      POSTGRES_NAME:      postgres
      POSTGRES_HOST:      sentry-postgresql-ha-pgpool.sentry.svc.cluster.local
      POSTGRES_PORT:      5432
    Mounts:
      /etc/sentry from config (ro)
      /var/lib/sentry/files from sentry-data (rw)
  Volumes:
   config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sentry-sentry
    Optional:  false
   sentry-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  sentry-data
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   sentry-web-5886c4cb48 (3/3 replicas created)
Events:          <none>

and as i was scrolling through sentry-web describe i noticed the pvc he is assigned to read from sentry-data, this volume is exactly at the worker-1 node which has all the sentry-web pods, mentioned just in case it matters

patsevanton commented 2 months ago

https://github.com/sentry-kubernetes/charts/blob/develop/charts/sentry/values.yaml#L167 Try strategyType: Recreate

evgenii-avdiukhin commented 2 months ago

critical=true is my custome label the problem is not with worker but with web

evgenii-avdiukhin commented 2 months ago

i think i will try accessMode: ReadWriteMany for filestorage in this case i guess it might work

patsevanton commented 2 months ago

Are you tried?

evgenii-avdiukhin commented 2 months ago

Are you tried?

yes i did, i tried to change pvc accessmode to ReadWriteMany but now i am stuck with this failed to provision volume with StorageClass "local-path": NodePath only supports ReadWriteOnce and ReadWriteOncePod (1.22+) access modes because i use local-path for pvc need to figure out how to dynamically provision volumes in my cloud anyways, seems like the issue is resolved thanks for your help!