Closed evgenii-avdiukhin closed 2 months ago
to exclude my cluster policies as reason for incorrect scheduling, i have tried to deploy sentry without any policies and the problem remains, all the pods distribute okay, but all 3 sentry-web replicas are created on worker-1
Hello! Could you format issue? Put you code to ```
Hello! Could you format issue? Put you code to ```
done
Are you try install sentry on clean k8s cluster?
Could you run:
kubectl describe nodes | grep Taints
Are you try install sentry on clean k8s cluster?
Could you run:
kubectl describe nodes | grep Taints
the cluster is pretty fresh, it was deployed with kubespray and has all default components + monitoring + jenkins and some minor stuff here are the taints from command you provided above
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Taints: <none>
Taints: <none>
Taints: <none>
Taints: <none>
Taints: <none>
Could you run:
kubectl describe deploy -n namespace sentry-worker
sure, here you go
kubectl describe deploy -n sentry sentry-worker
Name: sentry-worker
Namespace: sentry
CreationTimestamp: Tue, 10 Sep 2024 16:41:52 +0400
Labels: app=sentry
app.kubernetes.io/managed-by=Helm
chart=sentry-25.3.0
heritage=Helm
release=sentry
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: sentry
meta.helm.sh/release-namespace: sentry
Selector: app=sentry,release=sentry,role=worker
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=sentry
critical=true
release=sentry
role=worker
Annotations: checksum/config.yaml: 21302c385be962520babe466fe4164f3744f6873c1e19c2ccb92a8b39afdc057
checksum/configYml: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
checksum/sentryConfPy: d5f85a6a8afbc55eebe23801e1a51a0fb4c0428c9a73ef6708d8dc83e079cd49
Containers:
sentry-worker:
Image: getsentry/sentry:24.5.1
Port: <none>
Host Port: <none>
Command:
sentry
Args:
run
worker
Liveness: exec [sentry exec -c from sentry.celery import app; import os; dest="celery@{}".format(os.environ["HOSTNAME"]); print(app.control.ping(destination=[dest], timeout=5)[0][dest]["ok"])] delay=10s timeout=15s period=60s #success=1 #failure=5
Environment:
C_FORCE_ROOT: true
SNUBA: http://sentry-snuba:1218
VROOM: http://sentry-vroom:8085
SENTRY_SECRET_KEY: <set to the key 'key' in secret 'sentry-sentry-secret'> Optional: false
POSTGRES_PASSWORD: QrX0KlPh2i
POSTGRES_USER: postgres
POSTGRES_NAME: postgres
POSTGRES_HOST: sentry-postgresql-ha-pgpool.sentry.svc.cluster.local
POSTGRES_PORT: 5432
Mounts:
/etc/sentry from config (ro)
/var/lib/sentry/files from sentry-data (rw)
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sentry-sentry
Optional: false
sentry-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: sentry-worker-84b585cfbc (3/3 replicas created)
Events: <none>
i will also provide sentry-web describe in case you mistook, because the problem is actually with web
kubectl describe deploy -n sentry sentry-web
Name: sentry-web
Namespace: sentry
CreationTimestamp: Tue, 10 Sep 2024 16:41:52 +0400
Labels: app=sentry
app.kubernetes.io/managed-by=Helm
chart=sentry-25.3.0
heritage=Helm
release=sentry
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: sentry
meta.helm.sh/release-namespace: sentry
Selector: app=sentry,release=sentry,role=web
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=sentry
critical=true
release=sentry
role=web
Annotations: checksum/config.yaml: 21302c385be962520babe466fe4164f3744f6873c1e19c2ccb92a8b39afdc057
checksum/configYml: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
checksum/sentryConfPy: d5f85a6a8afbc55eebe23801e1a51a0fb4c0428c9a73ef6708d8dc83e079cd49
Containers:
sentry-web:
Image: getsentry/sentry:24.5.1
Port: 9000/TCP
Host Port: 0/TCP
Command:
sentry
run
web
Liveness: http-get http://:9000/_health/ delay=10s timeout=2s period=10s #success=1 #failure=5
Readiness: http-get http://:9000/_health/ delay=10s timeout=2s period=10s #success=1 #failure=5
Environment:
SNUBA: http://sentry-snuba:1218
VROOM: http://sentry-vroom:8085
SENTRY_SECRET_KEY: <set to the key 'key' in secret 'sentry-sentry-secret'> Optional: false
POSTGRES_PASSWORD: secret
POSTGRES_USER: postgres
POSTGRES_NAME: postgres
POSTGRES_HOST: sentry-postgresql-ha-pgpool.sentry.svc.cluster.local
POSTGRES_PORT: 5432
Mounts:
/etc/sentry from config (ro)
/var/lib/sentry/files from sentry-data (rw)
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: sentry-sentry
Optional: false
sentry-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: sentry-data
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: sentry-web-5886c4cb48 (3/3 replicas created)
Events: <none>
and as i was scrolling through sentry-web describe i noticed the pvc he is assigned to read from sentry-data, this volume is exactly at the worker-1 node which has all the sentry-web pods, mentioned just in case it matters
https://github.com/sentry-kubernetes/charts/blob/develop/charts/sentry/values.yaml#L167
Try strategyType: Recreate
critical=true is my custome label the problem is not with worker but with web
i think i will try accessMode: ReadWriteMany for filestorage in this case i guess it might work
Are you tried?
Are you tried?
yes i did, i tried to change pvc accessmode to ReadWriteMany
but now i am stuck with this
failed to provision volume with StorageClass "local-path": NodePath only supports ReadWriteOnce and ReadWriteOncePod (1.22+) access modes
because i use local-path for pvc
need to figure out how to dynamically provision volumes in my cloud
anyways, seems like the issue is resolved
thanks for your help!
Issue submitter TODO list
Describe the bug (actual behavior)
I have my sentry deployed in kubernetes using helm chart I have key component replicas increased to 2-3 ( web, worker, cron, relay , etc) I also have pod-anti-affinity cluster policy deployed with kyverno All the pods except sentry-worker, are distributed evenly across worker nodes All 3 sentry-worker pods are scheduled on worker-1 Moreover, whenever i shutdown worker-1, to check if pods evict, sentry-web pods are stuck in pending with event
0/8 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 5 node(s) didn't match pod anti-affinity rules. preemption: 0/8 nodes are available: 3 Preemption is not helpful for scheduling, 5 No preemption victims found for incoming pod.
Kyverno policies are cluster wide and are applied the same way for all sentry objects The problem is only with sentry-web deployment, everything else works fine, distributes and evicts as it should Do i miss something here? Been stuck with this issue for couple of days and cant find any solution.Expected behavior
No response
Your installation details
1.
2. chart version: 25.3.0, app version: 24.5.1
Steps to reproduce
deploy sentry with chart having sentry-web replicas increased and see if they schedule on different workers
Screenshots
No response
Logs
No response
Additional context
No response