Closed linsun closed 1 month ago
@GregHanson can you add standup-style update here async?
still in process of updating perf scripts for online-boutique example - focusing on xds evolution work
Started to look into this, hope to have the app running today!
updating scripts for this, got the app running with test completion. @GregHanson can send the data he has when test finishes.
Got some results appears to be consistent. May need help validate if traffic is intercepted by waypoints.
For L4: CPU is about 80% saving, mem is 99.5% saving.
4 namespaces, each has its own boutique app. each pod has 3 replicas.
we have proper runs now, he nixed the original blog and waiting for john's feedback on this when he gets online
will be working on code review comments on the blog today
blog content updated here: https://github.com/istio/istio.io/pull/13179
john likes these numbers a lot better. ran another test where size of environment (pods svcs etc) was doubled and percentages scale ~linearly
ran more tests yesterday, wanted a single node run. results similar, number of nodes does not effect. pinged john for review
keith mattix has an intern also running some performance numbers, don't think it will impact the blog. lin will be reviewing the blogs today hopefully
Left a bunch of comments, the blog needs to be clarified and precise on points imho.
Addressed most of Lin's comments. Greg will do another review on percentage consistency.
synced with lin last week; most changes already addressed. likewise prepared for the hoot tomorrow on the same topic. today will work to get a live env/demo ready for tomorrow
prep for hoot, getting env up running.
key q: unclear why online boutique app workloads use different cpu/mem comparing with sidecars vs ambient.
bug in query was root cause for odd results, but results are largely unchanged so the blog is still good. just need to update some charts with the new data
Lin added one more comment, greg to address today. Otherwise ready to ship. Just wording in the blog, no rerun tests
better ambient diagrams were requested
will see if blog needs the diagrams or if we can move forward without
@GregHanson to setup a time with @craigbox and Lin to discuss next steps for the blog.
I'm fine with this content going into a blog if it's useful to have it out now, but I've expressed an interest that we have a very strong (and reproduceable) "cost of ambient vs. sidecars and other mesh models" page on istio.io in Q1. I would hope that this blog post would be almost all the source material required for that!
Able to sync with Andrea Ma on accessing the baremetal env. Pls reach out to Ihor directly to resolve machine issue.
Got access - should be able to run the performance test once time permits
Need to wait till Andre finishes the 1.20 performance tests.
Got some successful run on CNCF env, having trouble to view grafana dashboard for containers.
There is an issue with cadvisor when K8S is installed on the CNCF hardware, see issue here
Kubernetes removed the container metrics from cadvisor in their kubelet process. It appears KinD, k3d and GKE have restored support for this (example). There is a KEP in k8s upstream that keeps getting pushed out: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md#metricscadvisor
Workaround: Deploy own cadvisor daemonset, and configure prometheus to scrape metrics from the new source:
kubectl apply -f -<<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app: cadvisor
name: cadvisor
rules:
- apiGroups:
- policy
resourceNames:
- cadvisor
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app: cadvisor
name: cadvisor
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cadvisor
subjects:
- kind: ServiceAccount
name: cadvisor
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/pod: docker/default
labels:
app: cadvisor
name: cadvisor
namespace: kube-system
spec:
selector:
matchLabels:
app: cadvisor
name: cadvisor
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
app: cadvisor
name: cadvisor
spec:
automountServiceAccountToken: false
containers:
- args:
- --housekeeping_interval=10s
- --max_housekeeping_interval=15s
- --event_storage_event_limit=default=0
- --event_storage_age_limit=default=0
- --enable_metrics=app,cpu,disk,diskIO,memory,network,process
- --docker_only
- --store_container_labels=false
- --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
image: gcr.io/cadvisor/cadvisor:v0.45.0
name: cadvisor
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: 800m
memory: 2000Mi
requests:
cpu: 400m
memory: 400Mi
volumeMounts:
- mountPath: /rootfs
name: rootfs
readOnly: true
- mountPath: /var/run
name: var-run
readOnly: true
- mountPath: /sys
name: sys
readOnly: true
- mountPath: /var/lib/docker
name: docker
readOnly: true
- mountPath: /dev/disk
name: disk
readOnly: true
priorityClassName: system-node-critical
serviceAccountName: cadvisor
terminationGracePeriodSeconds: 30
tolerations:
- key: node-role.kubernetes.io/controlplane
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/etcd
value: "true"
effect: NoExecute
volumes:
- hostPath:
path: /
name: rootfs
- hostPath:
path: /var/run
name: var-run
- hostPath:
path: /sys
name: sys
- hostPath:
path: /var/lib/docker
name: docker
- hostPath:
path: /dev/disk
name: disk
---
apiVersion: v1
kind: Service
metadata:
name: cadvisor
labels:
app: cadvisor
namespace: kube-system
spec:
selector:
app: cadvisor
ports:
- name: cadvisor
port: 8080
protocol: TCP
targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: kube-system
spec:
endpoints:
- metricRelabelings:
- sourceLabels:
- container_label_io_kubernetes_pod_name
targetLabel: pod
- sourceLabels:
- container_label_io_kubernetes_container_name
targetLabel: container
- sourceLabels:
- container_label_io_kubernetes_pod_namespace
targetLabel: namespace
- action: labeldrop
regex: container_label_io_kubernetes_pod_name
- action: labeldrop
regex: container_label_io_kubernetes_container_name
- action: labeldrop
regex: container_label_io_kubernetes_pod_namespace
port: cadvisor
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: node
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
replacement: /metrics/cadvisor
- sourceLabels:
- job
targetLabel: job
replacement: kubelet
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: cadvisor
EOF
updated helm install commands:
helm upgrade --install kube-prometheus-stack \
prometheus-community/kube-prometheus-stack \
--version 55.5.1 \
--namespace monitoring \
--create-namespace \
--values - <<EOF
alertmanager:
enabled: false
kubeStateMetrics:
enabled: false
nodeExporter:
enabled: true
kubelet:
enabled: true
prometheus:
prometheusSpec:
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
EOF
TODO: need to open a branch in istio.io to continue https://github.com/istio/istio.io/pull/13179
🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2024-01-31. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.
Created by the issue and PR lifecycle manager.
Closing this as i am not actively working on it. I don't think Craig is either. Please reopen if needed @craigbox
Needs to be updated to use more realistic sample apps.
https://github.com/istio/istio.io/pull/13179