Open DaazKu opened 1 year ago
I am also encountering this issue with the gateway deployment when using version 1.9.3
of the tempo-distributed Helm chart.
I've tried removing the gateway affinity TPL but that doesn't work. Here's that stanza.
gateway:
enabled: true
affinity: {}
The resulting warnings emitted are:
coalesce.go:289: warning: destination for tempo-distributed.gateway.affinity is a table. Ignoring non-table value (podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 12 }}
topologyKey: topology.kubernetes.io/zone
)
coalesce.go:289: warning: destination for tempo-distributed.gateway.affinity is a table. Ignoring non-table value (podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 12 }}
topologyKey: topology.kubernetes.io/zone
I also modified the default affinity
stanza to use preferredDuringSchedulingIgnoreDuringExecution
rather than the default value of requiredDuringSchedulingIgnoreDuringExecution
and that didn't work, either. Here's that stanza.
affinity: |
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 10 }}
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
{{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 12 }}
topologyKey: topology.kubernetes.io/zone
Here is the deployment error:
status:
conditions:
- lastTransitionTime: "2024-04-23T15:34:02Z"
lastUpdateTime: "2024-04-23T15:34:02Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2024-04-23T15:34:02Z"
lastUpdateTime: "2024-04-23T15:34:02Z"
message: ReplicaSet "tempo-gateway-7789b77696" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
Here is the pod-level error:
Name: tempo-gateway-69cf4bd4dd-ntnf9
Namespace: observability
Priority: 0
Service Account: tempo
Node: <none>
Labels: app.kubernetes.io/component=gateway
app.kubernetes.io/instance=tempo
app.kubernetes.io/name=tempo
pod-template-hash=69cf4bd4dd
Annotations: checksum/config: a465b5249192bb4606db6d4a260200bd3847b8b173f434841b597e87b4953b6b
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/tempo-gateway-69cf4bd4dd
Containers:
nginx:
Image: docker.io/nginxinc/nginx-unprivileged:1.19-alpine
Port: 8080/TCP
Host Port: 0/TCP
Readiness: http-get http://:http-metrics/ delay=15s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/docker-entrypoint.d from docker-entrypoint-d-override (rw)
/etc/nginx from config (rw)
/tmp from tmp (rw)
Conditions:
Type Status
PodScheduled False
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tempo-gateway
Optional: false
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
docker-entrypoint-d-override:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=gateway,app.kubernetes.io/instance=tempo,app.kubernetes.io/name=tempo
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4s (x3 over 25s) default-scheduler 0/1 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
I see deployment-gateway.yaml does use affinity
.
As another datapoint, this error occurs in Kubernetes 1.29.41
and 1.30.0
, which are the latest supported stable versions at the time of this comment. I'm using minikube
version 1.32.0
.
Your help is greatly appreciated.
UPDATE: The issue with the coalesce.go:289: warning: destination for tempo-distributed.gateway.affinity is a table. Ignoring non-table value
error is because the gateway deployment template assumes podAntiAffinity
will always be defined. So, the workaround is to use the following stanza for it:
gateway:
# -- Affinity for gateway pods.
# @default -- Hard node anti-affinity
affinity: |-
podAntiAffinity: {}
Turns out the issue for my deployment was with the anti-affinity stanza of the Loki Gateway pod conflicting with Tempo Gateway pod. I ended up changing the Loki stanza for the gateway pod to the following:
gateway:
# -- Affinity for gateway pods.
# @default -- Hard node anti-affinity
affinity: |-
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: gateway
topologyKey: kubernetes.io/hostname
I defined the following in my values.yml
I would expect to find those on the pod definition but it's not present.