Closed pierDipi closed 2 years ago
/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410
/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410
/retest
/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410
Hyperfoil controller API responded with 504 while checking for termination:
Doing request /run/0000/stats/total
Traceback (most recent call last):
File "./bin/run_benchmark.py", line 213, in <module>
await_termination(run_id)
File "./bin/run_benchmark.py", line 169, in await_termination
while is_terminated(run_id) is False:
File "./bin/run_benchmark.py", line 181, in is_terminated
info = get_run_info(run_id)
File "./bin/run_benchmark.py", line 110, in get_run_info
raise Exception(f"failed to get run info, status code {response.status}")
Exception: failed to get run info, status code 504
/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410
subscription.operators.coreos.com/hyperfoil-bundle created
error: state is not found
Hyperfoil cluster controller is throwing Java Heap Space errors during the run (worker nodes have 4GB of memory):
Metrics
Logs hyperfoil-cluster-controller.txt
Pod spec
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.129.8.6"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.129.8.6"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted
creationTimestamp: "2022-06-24T14:02:28Z"
labels:
app: hyperfoil-cluster
role: controller
name: hyperfoil-cluster-controller
namespace: hyperfoil
ownerReferences:
- apiVersion: hyperfoil.io/v1alpha2
blockOwnerDeletion: true
controller: true
kind: Hyperfoil
name: hyperfoil-cluster
uid: 106e299a-7c45-46dc-99cf-ca70a6b868ac
resourceVersion: "44124"
uid: dd2077de-34d3-4bf1-b864-a9c4ffb80926
spec:
containers:
- command:
- /deployment/bin/controller.sh
- -Dio.hyperfoil.deploy.timeout=120000
- -Dio.hyperfoil.deployer=k8s
- -Dio.hyperfoil.deployer.k8s.namespace=hyperfoil
- -Dio.hyperfoil.controller.host=0.0.0.0
- -Dio.hyperfoil.controller.external.uri=https://hyperfoil-cluster-hyperfoil.apps.ci-ln-vfxt1cb-72292.origin-ci-int-gce.dev.rhcloud.com
- -Dio.hyperfoil.rootdir=/var/hyperfoil/
- -Djgroups.thread_pool.max_threads=500
- -Dio.hyperfoil.controller.secured.via.proxy=true
image: quay.io/hyperfoil/hyperfoil:latest
imagePullPolicy: Always
name: controller
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000670000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/hyperfoil
name: hyperfoil
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-w4hgx
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: controller-dockercfg-jglff
nodeName: ci-ln-vfxt1cb-72292-lt2sd-worker-a-tcn74
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000670000
seLinuxOptions:
level: s0:c26,c10
serviceAccount: controller
serviceAccountName: controller
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: hyperfoil
- name: kube-api-access-w4hgx
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-06-24T14:02:28Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-06-24T14:02:43Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-06-24T14:02:43Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-06-24T14:02:28Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: cri-o://4cddfde633b9c6187b4791cbfac26034be9957dc54383bd588352c389cdf1561
image: quay.io/hyperfoil/hyperfoil:latest
imageID: quay.io/hyperfoil/hyperfoil@sha256:7e5d387ad057eceb5f75aff9ef9f7be50dcc9e5fb952e7e524b21229cf3d7279
lastState: {}
name: controller
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-06-24T14:02:42Z"
hostIP: 10.0.128.17
phase: Running
podIP: 10.129.8.6
podIPs:
- ip: 10.129.8.6
qosClass: BestEffort
startTime: "2022-06-24T14:02:28Z"
Hi @rvansa @willr3 is the high memory consumption of the Hyperfoil cluster controller expected? https://github.com/openshift-knative/eventing-hyperfoil-benchmark/pull/52#issuecomment-1165737053
We have at least 100 pods connected to the Hyperfoil cluster controller, each sending metrics every second, what would you recommend to lower memory consumption?
@pierDipi Hi, I haven't spinned up that many sources for statistics myself; I guess that some statistics might be held per-node and per-second, and when this instance hosts a histogram (several kilobytes) the total usage grow rather high. Could you grab a heap dump from the controller and send it over? I don't think there's a button that would lower the usage right now but I could come with some switches to reduce the precision, increase timeout, don't keep it per-node etc.
infra /retest-required
* could not initialize namespace: couldn't create secret registry-pull-credentials: secrets "registry-pull-credentials" is forbidden: unable to create new content in namespace ci-op-5t43dh6n because it is being terminated
* ```
/retest-required
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: matzew, pierDipi
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Signed-off-by: Pierangelo Di Pilato pierdipi@redhat.com