Open kpouget opened 10 months ago
I could work around the issue by increasing the memory limit of the Istio egress/ingress Pods (to 4GB, to be safe):
apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
name: minimal
namespace: istio-system
spec:
gateways:
egress:
runtime:
container:
resources:
limits:
memory: 4Gi
ingress:
runtime:
container:
resources:
limits:
memory: 4Gi
but this wasn't happening a few weeks ago, with RHOAI 2.1.0 and 300 models (when running on AWS with 35 nodes, whereas this bug occured on a single-node OpenShift)
Can this be a regression, or is it somehow expected?
@kpouget I am wondering if we can get some insights into these metrics as well:
pilot_xds_push_time_bucket
pilot_proxy_convergence_time_bucket
pilot_proxy_queue_time_bucket
but this wasn't happening a few weeks ago, with RHOAI 2.1.0 and 300 models (when running on AWS with 35 nodes, whereas this bug occured on a single-node OpenShift)
@kpouget was it also running on istio underneath? if so - how was it configured?
@kpouget was it also running on istio underneath? if so - how was it configured?
yes it was. Istio was using these files for configuration (pinned commit from what I used at the time of the test)
I managed to reduce resource consumption roughly by half. Here's the script which you can apply.
In short this script:
PILOT_FILTER_GATEWAY_CLUSTER_CONFIG
#!/bin/bash
cat <<EOF > smcp-patch.yaml
apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
name: data-science-smcp
namespace: istio-system
spec:
gateways:
egress:
runtime:
container:
resources:
limits:
cpu: 1024m
memory: 4G
requests:
cpu: 128m
memory: 1G
ingress:
runtime:
container:
resources:
limits:
cpu: 1024m
memory: 4G
requests:
cpu: 128m
memory: 1G
runtime:
components:
pilot:
container:
env:
PILOT_FILTER_GATEWAY_CLUSTER_CONFIG: "true"
resources:
limits:
cpu: 1024m
memory: 4G
requests:
cpu: 128m
memory: 1024Mi
EOF
trap '{ rm -rf -- smcp-patch.yaml; }' EXIT
kubectl patch smcp/data-science-smcp -n istio-system --type=merge --patch-file smcp-patch.yaml
namespaces=$(kubectl get ns -ltopsail.scale-test -o name | cut -d'/' -f 2)
# limit sidecarproxy endpoints to its own ns and istio-system
for ns in $namespaces; do
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
name: default
namespace: $ns
spec:
egress:
- hosts:
- "./*"
- "istio-system/*"
EOF
done
# force changes to take effect
for ns in $namespaces; do
kubectl delete pods --all -n "${ns}"
done
# force re-creation of all pods with envoy service registry rebuilt
kubectl delete pods --all -n istio-system
❯ istioctl proxy-config endpoint deployment/istio-ingressgateway -n istio-system | wc -l
1052
❯ istioctl proxy-config endpoint $(kubectl get pods -o name -n watsonx-scale-test-u1) -n watsonx-scale-test-u1 | wc -l
1065
❯ kubectl top pods -n istio-system
NAME CPU(cores) MEMORY(bytes)
istio-egressgateway-6b7fdb6cb9-lh5jg 100m 2519Mi
istio-ingressgateway-7dbdc66dd7-nkxxq 91m 2320Mi
istiod-data-science-smcp-65f4877fff-tndf4 82m 1392Mi
❯ kubectl k top pods -n watsonx-scale-test-u0 --containers
POD NAME CPU(cores) MEMORY(bytes)
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq POD 0m 0Mi
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq istio-proxy 14m 372Mi
...
❯ istioctl proxy-config endpoint deployment/istio-ingressgateway -n istio-system | wc -l
1052 // it knows the whole world, so that is the same
❯ istioctl proxy-config endpoint $(kubectl get pods -o name -n watsonx-scale-test-u1) -n watsonx-scale-test-u1 | wc -l
34
❯ kubectl top pods -n istio-system
NAME CPU(cores) MEMORY(bytes)
istio-egressgateway-5778df8594-j869r 83m 444Mi
istio-ingressgateway-6847d4b974-sk25z 77m 946Mi
istiod-data-science-smcp-5568884d7d-45zkz 36m 950Mi
❯ kubectl k top pods -n watsonx-scale-test-u0 --containers
POD NAME CPU(cores) MEMORY(bytes)
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq POD 0m 0Mi
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq istio-proxy 6m 136Mi
...
As part of my automated scale test, I observe that the InferenceService sometimes reports as
Loaded
, but the call to GRPC endpoint returns with errors.Examples:
Versions