Closed sreuland closed 3 months ago
We should investigate the pvc-mapping implications of though
We should investigate the pvc-mapping implications of though
yes, referring to PVC lifecycle and how deployment binds to the PVC? the assumption on solution was to define standalone PVC resource using same default storage-class in yml manifest first, and deployment is defined after, then kubectl apply
will treat it idempotently, i.e. it will create the PVC the first time only when it detects the PVC by metadata:name
does not exist, and any ensuing attempts when it already exists by same name will skip, leaving PVC and data on volume as-is. The deployment would bind to the single PVC by name, and deployment would fail if the PVC has not been created first or if PVC is bound to any other pod at the time.
this task should include time necessary to test applying the new manifest file via kubectl repeatedly in dev namespace first to confirm the outcome as expected.
have attempted to replicate the scenario where the pvc gets corrupted due to a rolling update being triggered on the rpc statefulset.
not able to replicate corruption on pvc after initiating multiple rolling updates against rpc statefulset, however, can consistently replicate a rpc service downtime window of about 2-3 minutes right after rolling upgrade begins.
disabled flux on statefulset first, then updated the statefulset's container image ref, which triggers k8s to initiate rolling upgrade on the statefulset, I set up a watch on pod and events activity afterwards, this is config'd for pubnet:
$ kubectl get pods -n soroban-rpc-pubnet-dev -o wide --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
soroban-rpc-pubnet-dev-0 1/1 Running 0 60m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
.... statefulset is updated, triggers rolling upgrade
soroban-rpc-pubnet-dev-0 1/1 Terminating 0 61m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Terminating 0 61m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Terminating 0 61m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Terminating 0 61m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Terminating 0 61m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Terminating 0 61m 192.168.38.83 ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Pending 0 0s <none> <none> <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Pending 0 0s <none> ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 ContainerCreating 0 0s <none> ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 ContainerCreating 0 2s <none> ip-172-22-20-98.ec2.internal <none> <none>
soroban-rpc-pubnet-dev-0 0/1 Running 0 6s 192.168.38.98 ip-172-22-20-98.ec2.internal <none> <none>
... the above activity prints within a couple seconds, then a window of a couple minutes passes before this next print, signaling the rpc http service is up and readiness probe passed
soroban-rpc-pubnet-dev-0 1/1 Running 0 2m51s 192.168.38.98 ip-172-22-20-98.ec2.internal <none> <none>
$ kubectl get events -n soroban-rpc-pubnet-dev --watch
LAST SEEN TYPE REASON OBJECT MESSAGE
0s Normal SuccessfulDelete statefulset/soroban-rpc-pubnet-dev delete Pod soroban-rpc-pubnet-dev-0 in StatefulSet soroban-rpc-pubnet-dev successful
0s Normal SuccessfulCreate statefulset/soroban-rpc-pubnet-dev create Pod soroban-rpc-pubnet-dev-0 in StatefulSet soroban-rpc-pubnet-dev successful
0s Normal Scheduled pod/soroban-rpc-pubnet-dev-0 Successfully assigned soroban-rpc-pubnet-dev/soroban-rpc-pubnet-dev-0 to ip-172-22-20-205.ec2.internal
0s Warning FailedAttachVolume pod/soroban-rpc-pubnet-dev-0 Multi-Attach error for volume "pvc-cc6b516c-00e7-45e9-a0de-fb4fbb3d6451" Volume is already exclusively attached to one node and can't be attached to another
0s Normal SuccessfulAttachVolume pod/soroban-rpc-pubnet-dev-0 AttachVolume.Attach succeeded for volume "pvc-cc6b516c-00e7-45e9-a0de-fb4fbb3d6451"
0s Normal Pulling pod/soroban-rpc-pubnet-dev-0 Pulling image "docker.io/stellar/soroban-rpc:20.3.0"
0s Normal Pulled pod/soroban-rpc-pubnet-dev-0 Successfully pulled image "docker.io/stellar/soroban-rpc:20.3.0" in 3.21401564s (3.214026454s including waiting)
0s Normal Created pod/soroban-rpc-pubnet-dev-0 Created container soroban-rpc
0s Normal Started pod/soroban-rpc-pubnet-dev-0 Started container soroban-rpc
0s Warning Unhealthy pod/soroban-rpc-pubnet-dev-0 Readiness probe failed:
... readiness probe failures repeat, stops once rpc http service passes readiness probe a couple minutes later
What happens in the window of 2-3 minutes between when the newly spun up statefulset pod first reports readiness of 0/1
and then readiness of 1/1
is rpc http downtime, confirmed it by attempting to access the rpc http service on container and get connection refused, the new pod is running on cluster, and the rpc container is running as a process in pod, verified by exec'ing into the pod/container and seeing the rpc process running, but the rpc service has not started the http service yet, and emits no logs during this initialization time even with LOG_LEVEL=DEBUG
, rpc hasn't started stellar-core captive process yet either, as don't see the core shown in the container process list.
This state lasts about 2-3 minutes, during which the statefulset controller is attempting readiness probe checks for getHealth
at the pod and they are failing, but the stateful set controller has terminated the older pod earlier as part of rolling update as seen in the captured events, it only has the newer pod which is in this non-ready state. After 2-3 minutes, the stellar-core process appears in container and the first log output from rpc appears as starting Soroban JSON RPC server
and the http process appears on port 8000 and responds to getHealth
requests.
I think there are two separate/orthogonal fixes:
starting Soroban JSON RPC server
is logged.cc: @2opremio
I ran some additional tests to verify 2-3 minute lag behavior as was seen here on rpc, the behavior only occurs in k8s pod/container as part of update rollout. I was able to verify same rpc version running in non-k8s(quickstart) and the rpc process starts the http and metrics listeners, ingestion(captive core), and logging, immediately after rpc process is started on the o/s.
we continue this ticket, but change the scope to fix the rpc deployment in general terms to avoid downtime period during rollout, evaluate if Deployment with Recreate update strategy retains the old pod until new passes readinessProbe.
identified the silent startup issue as unrelated to kubernetes aspects, was due to rpc internal initialization - https://github.com/stellar/soroban-rpc/issues/128
add more verbose logging in rpc to emit status in the startup routine prior to the http being started, i.e. add more verbose logging of stuff being done before starting Soroban JSON RPC server is logged.
after identification of https://github.com/stellar/soroban-rpc/issues/128 and https://github.com/stellar/soroban-rpc/pull/127, the remaining scope on this issue is now obsolete, further discussion with team has narrowed rpc rolling update downtime to be addressed on - https://github.com/stellar/helm-charts/issues/82.
What problem does your feature solve?
RPC usage of Statefulset on k8s may have leave potential for problems during rolling upgrades:
What would you like to see?
The rpc rolling upgrade will not corrupt the underlying pvc and will not exhibit any http service downtime during the rollout.
Evaluate if converting to a Deployment with Recreate update strategy, and deifne a stand-alone PVC which the deployment will reference for volume mount, will fix the http service downtime aspect, that rollout should retain the old pod until new pod passes readinessProbe.
What alternatives are there?