Closed ksingh-scogo closed 8 months ago
What are the values you are using? Don't need the entire values file from the chart, only the ones that you have changed.
@caleblloyd pls excuse the chattiness
found the solution just for health check failure , manually editing stateful set scheme from HTTP
to HTTPS
for livenessProbe
, readinessProbe
and startupProbe
kubectl edit statefulset.apps/nats-jetstream
livenessProbe:
httpGet:
path: /healthz?js-enabled-only=true
port: monitor
scheme: HTTPS
readinessProbe:
httpGet:
path: /healthz?js-server-only=true
port: monitor
scheme: HTTPS
startupProbe:
httpGet:
path: /healthz
port: monitor
scheme: HTTPS
Because
monitor.tls
has been enabled in values.yaml
and as a result NATS starts monitoring on https
Starting https monitor on 0.0.0.0:8222
So when monitor.tls
is requested from values.yaml
, helm template should update statefulset
health check schem to HTTPS
Still getting [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
in in prom-exporter
container
[35] 2023/08/21 14:58:17.670767 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
[35] 2023/08/21 14:58:47.670179 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
[35] 2023/08/21 14:59:17.670885 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
[35] 2023/08/21 14:59:47.670274 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
[35] 2023/08/21 15:00:17.670546 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
[35] 2023/08/21 15:00:47.670372 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
[35] 2023/08/21 15:01:17.670191 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value
All health checks are passing, for the record
@caleblloyd your pointers on this would be of great help
I think this is happening since TLS for monitor api is enabled. When I generate the helm-templated version it has:
- args:
- -port=7777
- -connz
- -routez
- -subz
- -varz
- -prefix=nats
- -use_internal_server_id
- -jsz=all
- http://localhost:8222/
image: natsio/prometheus-nats-exporter:0.13.0
name: prom-exporter
ports:
- containerPort: 7777
name: prom-metrics
Note that is passes http://localhost:8222/
instead of https://localhost:8222/
So the solution would be to have the chart generate the https url instead.
The other issue is that there doesn't seem to be a way to specify the tlscacert/tlscert/tlskey info, or the dns name to use.
So the following solutions may work:
To fix the chart:
http://localhost:8222/
would need to be https://<some_configurable_hostname>:<monitor_port>/
After deploying nats cluster using helm the
prom-exporter
container is failing inside the jestream pod, with the error[ERR] Could not find server_id: invalid character 'C' looking for beginning of value
This is causing the health check to fail with error
Warning Unhealthy 75s (x16 over 3m45s) kubelet Startup probe failed: HTTP probe failed with statuscode: 400
kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/nats-jetstream LoadBalancer 10.0.111.248 20.219.34.176 4222:31806/TCP,8080:30823/TCP 13m service/nats-jetstream-headless ClusterIP None 4222/TCP,8080/TCP,6222/TCP,8222/TCP 13m
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nats-jetstream-box 1/1 1 1 13m
NAME DESIRED CURRENT READY AGE replicaset.apps/nats-jetstream-box-56586bdf9f 1 1 1 13m
NAME READY AGE statefulset.apps/nats-jetstream 0/3 13m
################################################################################
Global options
################################################################################ global: image:
global image pull policy to use for all container images in the chart
global labels will be applied to all resources deployed by the chart
labels: app.kubernetes.io/name: nats app.kubernetes.io/version: 2.9.21 app.kubernetes.io/managed-by: Helm
################################################################################
Common options
################################################################################
override name of the chart
nameOverride:
override full name of the chart+release
fullnameOverride: nats-jetstream
override the namespace that resources are installed into
namespaceOverride:
reference a common CA Certificate or Bundle in all nats config
tls
blocks and nats-box contextsnote:
tls.verify
still must be set in the appropriate nats configtls
blocks to require mTLStlsCA: enabled: false
set configMapName in order to mount an existing configMap to dir
configMapName: nats-ca
set secretName in order to mount an existing secretName to dir
secretName:
directory to mount the configMap or secret to
dir: /etc/nats-ca-cert
key in the configMap or secret that contains the CA Certificate or Bundle
key: ca.crt
################################################################################
NATS Stateful Set and associated resources
################################################################################
############################################################
NATS config
############################################################ config: cluster: enabled: true port: 6222
must be 2 or higher when jetstream is enabled
jetstream: enabled: true fileStore: enabled: true dir: /data
nats: port: 4222 tls: enabled: true
set secretName in order to mount an existing secret to dir
websocket: enabled: true port: 8080 tls: enabled: false
set secretName in order to mount an existing secret to dir
monitor: enabled: true port: 8222 tls:
config.nats.tls must be enabled also
resolver: enabled: false dir: /data/resolver
############################################################
stateful set -> pod template -> nats container
############################################################ container: image: repository: nats tag: 2.9.21-alpine pullPolicy: IfNotPresent merge:
recommended limit is at least 2 CPU cores and 8Gi Memory for production JetStream clusters
container port options
must be enabled in the config section also
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#containerport-v1-core
ports: nats: {} leafnodes: {} websocket: {} mqtt: {} cluster: {} gateway: {} monitor: {} profiling: {}
map with key as env var name, value can be string or map
example:
#
env:
GOMEMLIMIT: 7GiB
TOKEN:
valueFrom:
secretKeyRef:
name: nats-auth
key: token
env: {}
merge or patch the container
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core
merge: {}
patch: []
############################################################
stateful set -> pod template -> reloader container
############################################################ reloader: enabled: true image: repository: natsio/nats-server-config-reloader tag: 0.11.0 pullPolicy: IfNotPresent merge:
recommended limit is at least 2 CPU cores and 8Gi Memory for production JetStream clusters
env var map, see nats.env for an example
env: {}
all nats container volume mounts with the following prefixes
will be mounted into the reloader container
natsVolumeMountPrefixes:
/etc/
merge or patch the container
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core
merge: {}
patch: []
############################################################
stateful set -> pod template -> prom-exporter container
############################################################
config.monitor must be enabled
promExporter: enabled: true image: repository: natsio/prometheus-nats-exporter tag: 0.12.0 pullPolicy: registry:
port: 7777
env var map, see nats.env for an example
env: {}
merge or patch the container
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core
merge: {} patch: []
############################################################
prometheus pod monitor
############################################################ podMonitor: enabled: true
############################################################
service
############################################################
Bug : Currently Nats Helm chart does not allow to add annotation at service level
See : https://github.com/nats-io/k8s/issues/784
Workaround : Add annotation manually
kubectl annotate service -n nats nats-jetstream service.beta.kubernetes.io/azure-load-balancer-resource-group=AzureResourceGroup
service: enabled: true merge: spec: type: LoadBalancer loadBalancerIP: "x.x.x.x" patch: []
defaults to "{{ include "nats.fullname" $ }}"
name:
############################################################
other nats extension points
############################################################
stateful set
statefulSet:
merge or patch the stateful set
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#statefulset-v1-apps
merge: {} patch: []
defaults to "{{ include "nats.fullname" $ }}"
name:
stateful set -> pod template
podTemplate:
adds a hash of the ConfigMap as a pod annotation
this will cause the StatefulSet to roll when the ConfigMap is updated
configChecksumAnnotation: true
map of topologyKey: topologySpreadConstraint
labelSelector will be added to match StatefulSet pods
# topologySpreadConstraints: kubernetes.io/hostname: maxSkew: 1 whenUnsatisfiable: DoNotSchedule #
topologySpreadConstraints: {}
merge or patch the pod template
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#pod-v1-core
merge: {} patch: []
headless service
headlessService: enabled: false
merge or patch the headless service
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#service-v1-core
merge: {} patch: []
defaults to "{{ include "nats.fullname" $ }}-headless"
name:
config map
configMap:
merge or patch the config map
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#configmap-v1-core
merge: {} patch: []
defaults to "{{ include "nats.fullname" $ }}-config"
name:
pod disruption budget
podDisruptionBudget: enabled: true
merge or patch the pod disruption budget
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#poddisruptionbudget-v1-policy
merge: {} patch: []
defaults to "{{ include "nats.fullname" $ }}"
name:
service account
serviceAccount: enabled: false
merge or patch the service account
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#serviceaccount-v1-core
merge: {} patch: []
defaults to "{{ include "nats.fullname" $ }}"
name:
############################################################
natsBox
#
NATS Box Deployment and associated resources
############################################################ natsBox: enabled: true
############################################################
NATS contexts
############################################################ contexts: default: creds:
set contents in order to create a secret with the creds file contents
name of context to select by default
defaultContextName: default
############################################################
deployment -> pod template -> nats-box container
############################################################ container: image: repository: natsio/nats-box tag: 0.13.8 pullPolicy: IfNotPresent registry:
############################################################
other nats-box extension points
############################################################
deployment
deployment:
merge or patch the deployment
deployment -> pod template
podTemplate:
merge or patch the pod template
contexts secret
contextsSecret:
merge or patch the context secret
contents secret
contentsSecret:
merge or patch the contents secret
service account
serviceAccount: enabled: false
merge or patch the service account
################################################################################
Extra user-defined resources
################################################################################ #
add arbitrary user-generated resources
example:
#
config:
websocket:
enabled: true
extraResources:
- apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name:
$tplYaml: >
{{ include "nats.fullname" $ | quote }}
labels:
$tplYaml: |
{{ include "nats.labels" $ }}
spec:
hosts:
- demo.nats.io
gateways:
- my-gateway
http:
- name: default
match:
- name: root
uri:
exact: /
route:
- destination:
host:
$tplYaml: >
{{ .Values.service.name | quote }}
port:
number:
$tplYaml: >
{{ .Values.config.websocket.port }}
# extraResources: []
Name: nats-jetstream-0 Namespace: nats Priority: 0 Service Account: default Node: aks-ondemand-20722617-vmss000002/10.224.0.6 Start Time: Sun, 20 Aug 2023 23:42:54 +0530 Labels: app.kubernetes.io/component=nats app.kubernetes.io/instance=nats app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=nats app.kubernetes.io/version=2.9.21 controller-revision-hash=nats-jetstream-8b6bb9b85 environment=staging helm.sh/chart=nats-1.0.2 statefulset.kubernetes.io/pod-name=nats-jetstream-0 Annotations: checksum/config: b40d09e5645850ef7937f414d605f9f91199172781c2f89e698bffcef15ff9ee Status: Running IP: 10.244.1.35 IPs: IP: 10.244.1.35 Controlled By: StatefulSet/nats-jetstream Containers: nats: Container ID: containerd://44dc3d9a95f761b42cc6809604f216f1b79b4cb588a05f5c53678a120773dbeb Image: nats:2.9.21-alpine Image ID: docker.io/library/nats@sha256:511f5c4cfc6fdd61eb66afab99dfb38bed69aae630d8d5b36bc9bfc716723cd8 Ports: 4222/TCP, 8080/TCP, 6222/TCP, 8222/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP Args: --config /etc/nats-config/nats.conf State: Running Started: Sun, 20 Aug 2023 23:43:07 +0530 Ready: False Restart Count: 0 Limits: cpu: 500m memory: 128Mi Requests: cpu: 250m memory: 64Mi Liveness: http-get http://:monitor/healthz%3Fjs-enabled-only=true delay=10s timeout=5s period=30s #success=1 #failure=3 Readiness: http-get http://:monitor/healthz%3Fjs-server-only=true delay=10s timeout=5s period=10s #success=1 #failure=3 Startup: http-get http://:monitor/healthz delay=10s timeout=5s period=10s #success=1 #failure=90 Environment: POD_NAME: nats-jetstream-0 (v1:metadata.name) SERVER_NAME: $(POD_NAME) Mounts: /data from nats-jetstream-js (rw) /etc/nats-certs/nats from nats-tls (rw) /etc/nats-config from config (rw) /var/run/nats from pid (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bvdpc (ro) reloader: Container ID: containerd://834f4510bcdec3f46e532fd88e9a93e1d746300004eccc523c46dff844278a19 Image: natsio/nats-server-config-reloader:0.11.0 Image ID: docker.io/natsio/nats-server-config-reloader@sha256:c3a755eab2cc4702878d8d7bb75b82cd692a2557315cd18a0fac84f77f9253c9 Port:
Host Port:
Args:
-pid
/var/run/nats/nats.pid
-config
/etc/nats-config/nats.conf
-config
/etc/nats-certs/nats/tls.crt
-config
/etc/nats-certs/nats/tls.key
State: Running
Started: Sun, 20 Aug 2023 23:43:08 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 64Mi
Requests:
cpu: 50m
memory: 64Mi
Environment:
Mounts:
/etc/nats-certs/nats from nats-tls (rw)
/etc/nats-config from config (rw)
/var/run/nats from pid (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bvdpc (ro)
prom-exporter:
Container ID: containerd://23ada762bd737e752e17a00c9b223468dce58de93e066e4e11c2ab388ba169a8
Image: natsio/prometheus-nats-exporter:0.12.0
Image ID: docker.io/natsio/prometheus-nats-exporter@sha256:74e768968abb7883f6c89639a4d7d8f59054c61297c1f4c4b633cfeb6c8127dc
Port: 7777/TCP
Host Port: 0/TCP
Args:
-port=7777
-connz
-routez
-subz
-varz
-prefix=nats
-use_internal_server_id
-jsz=all
http://localhost:8222/
State: Running
Started: Sun, 20 Aug 2023 23:43:08 +0530
Ready: True
Restart Count: 0
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bvdpc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nats-jetstream-js:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nats-jetstream-js-nats-jetstream-0
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nats-jetstream-config
Optional: false
pid:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
nats-tls:
Type: Secret (a volume populated by a Secret)
SecretName: nats-client-tls
Optional: false
kube-api-access-bvdpc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/component=nats,app.kubernetes.io/instance=nats,app.kubernetes.io/name=nats
Events:
Type Reason Age From Message
Normal Scheduled 4m7s default-scheduler Successfully assigned nats/nats-jetstream-0 to aks-ondemand-20722617-vmss000002 Normal SuccessfulAttachVolume 3m56s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-2971d4c6-ab0c-4cbc-9f1e-332227b1de14" Normal Pulled 3m55s kubelet Container image "nats:2.9.21-alpine" already present on machine Normal Created 3m55s kubelet Created container nats Normal Started 3m54s kubelet Started container nats Normal Pulled 3m54s kubelet Container image "natsio/nats-server-config-reloader:0.11.0" already present on machine Normal Created 3m54s kubelet Created container reloader Normal Started 3m54s kubelet Started container reloader Normal Pulled 3m54s kubelet Container image "natsio/prometheus-nats-exporter:0.12.0" already present on machine Normal Created 3m54s kubelet Created container prom-exporter Normal Started 3m54s kubelet Started container prom-exporter Warning Unhealthy 75s (x16 over 3m45s) kubelet Startup probe failed: HTTP probe failed with statuscode: 400
[33] 2023/08/20 18:13:48.230733 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value [33] 2023/08/20 18:14:18.231273 [ERR] Could not find server_id: invalid character 'C' looking for beginning of value