Increase Brokers and Triggers for CI test

pierDipi commented 2 years ago

Signed-off-by: Pierangelo Di Pilato pierdipi@redhat.com

pierDipi commented 2 years ago

/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410

pierDipi commented 2 years ago

/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410

pierDipi commented 2 years ago

/retest

pierDipi commented 2 years ago

/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410

pierDipi commented 2 years ago

Hyperfoil controller API responded with 504 while checking for termination:

Doing request /run/0000/stats/total
Traceback (most recent call last):
  File "./bin/run_benchmark.py", line 213, in <module>
    await_termination(run_id)
  File "./bin/run_benchmark.py", line 169, in await_termination
    while is_terminated(run_id) is False:
  File "./bin/run_benchmark.py", line 181, in is_terminated
    info = get_run_info(run_id)
  File "./bin/run_benchmark.py", line 110, in get_run_info
    raise Exception(f"failed to get run info, status code {response.status}")
Exception: failed to get run info, status code 504

pierDipi commented 2 years ago

/test 410-br-p10-r3-ord-upstream-nightly-aws-ocp-410

pierDipi commented 2 years ago

subscription.operators.coreos.com/hyperfoil-bundle created
error: state is not found

pierDipi commented 2 years ago

Hyperfoil cluster controller is throwing Java Heap Space errors during the run (worker nodes have 4GB of memory):

Metrics

Logs hyperfoil-cluster-controller.txt

Pod spec

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.129.8.6"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.129.8.6"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted
  creationTimestamp: "2022-06-24T14:02:28Z"
  labels:
    app: hyperfoil-cluster
    role: controller
  name: hyperfoil-cluster-controller
  namespace: hyperfoil
  ownerReferences:
  - apiVersion: hyperfoil.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: Hyperfoil
    name: hyperfoil-cluster
    uid: 106e299a-7c45-46dc-99cf-ca70a6b868ac
  resourceVersion: "44124"
  uid: dd2077de-34d3-4bf1-b864-a9c4ffb80926
spec:
  containers:
  - command:
    - /deployment/bin/controller.sh
    - -Dio.hyperfoil.deploy.timeout=120000
    - -Dio.hyperfoil.deployer=k8s
    - -Dio.hyperfoil.deployer.k8s.namespace=hyperfoil
    - -Dio.hyperfoil.controller.host=0.0.0.0
    - -Dio.hyperfoil.controller.external.uri=https://hyperfoil-cluster-hyperfoil.apps.ci-ln-vfxt1cb-72292.origin-ci-int-gce.dev.rhcloud.com
    - -Dio.hyperfoil.rootdir=/var/hyperfoil/
    - -Djgroups.thread_pool.max_threads=500
    - -Dio.hyperfoil.controller.secured.via.proxy=true
    image: quay.io/hyperfoil/hyperfoil:latest
    imagePullPolicy: Always
    name: controller
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000670000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/hyperfoil
      name: hyperfoil
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-w4hgx
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: controller-dockercfg-jglff
  nodeName: ci-ln-vfxt1cb-72292-lt2sd-worker-a-tcn74
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000670000
    seLinuxOptions:
      level: s0:c26,c10
  serviceAccount: controller
  serviceAccountName: controller
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: hyperfoil
  - name: kube-api-access-w4hgx
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-24T14:02:28Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-06-24T14:02:43Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-06-24T14:02:43Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-06-24T14:02:28Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://4cddfde633b9c6187b4791cbfac26034be9957dc54383bd588352c389cdf1561
    image: quay.io/hyperfoil/hyperfoil:latest
    imageID: quay.io/hyperfoil/hyperfoil@sha256:7e5d387ad057eceb5f75aff9ef9f7be50dcc9e5fb952e7e524b21229cf3d7279
    lastState: {}
    name: controller
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-06-24T14:02:42Z"
  hostIP: 10.0.128.17
  phase: Running
  podIP: 10.129.8.6
  podIPs:
  - ip: 10.129.8.6
  qosClass: BestEffort
  startTime: "2022-06-24T14:02:28Z"

pierDipi commented 2 years ago

Hi @rvansa @willr3 is the high memory consumption of the Hyperfoil cluster controller expected? https://github.com/openshift-knative/eventing-hyperfoil-benchmark/pull/52#issuecomment-1165737053

We have at least 100 pods connected to the Hyperfoil cluster controller, each sending metrics every second, what would you recommend to lower memory consumption?

rvansa commented 2 years ago

@pierDipi Hi, I haven't spinned up that many sources for statistics myself; I guess that some statistics might be held per-node and per-second, and when this instance hosts a histogram (several kilobytes) the total usage grow rather high. Could you grab a heap dump from the controller and send it over? I don't think there's a button that would lower the usage right now but I could come with some switches to reduce the precision, increase timeout, don't keep it per-node etc.

pierDipi commented 2 years ago

infra /retest-required

pierDipi commented 2 years ago


  * could not initialize namespace: couldn't create secret registry-pull-credentials: secrets "registry-pull-credentials" is forbidden: unable to create new content in namespace ci-op-5t43dh6n because it is being terminated 
  * ```
/retest-required

openshift-ci[bot] commented 2 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matzew, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift-knative/eventing-hyperfoil-benchmark/blob/main/OWNERS)~~ [matzew,pierDipi] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

openshift-knative / eventing-hyperfoil-benchmark

Increase Brokers and Triggers for CI test #52