IBM Cloud Satellite: Outline supported workflow for user to enable CPU Pinning (CPU Manager), NUMA Aware scheduling (Memory Manager + Topology Manager)

relyt0925 commented 2 years ago

Various Telco customers have a requirement for the ability to pin their containerized applications to a specific set of CPUs on a specific NUMA node (and ensure the associated memory is tied to that NUMA node as well). This is critical for the low latency high performance nature necessary for these suit of applications.

The IBM Cloud Satellite team needs to outline steps that a user can utilize to enable this feature set appropraitely and walk them through the configuration values available to them. The assocaiated upstream documentation is here: Memory Manager: https://kubernetes.io/docs/tasks/administer-cluster/memory-manager/ CPU Manager: https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ Topology Manager: https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/

The automation that is documented must ultimately result in a config that persists on the node across updates/reboots, etc. It is expected that the customer will need to tailor the general framework to the specific choices they might choose to make for their specific environment.

These docs might be adjusted/invalidated when a pure upstream solution is provided (a piece of it is outlined here: https://github.com/openshift/hypershift/issues/1510). Until then automation should be documented that works in the environment. This can likely be a daemonset that ends up setting a systemd file on the node that ensures the appropriate config is added to the kubelet.conf file on each update.

It also should define a workflow where the user specifically outlines the NUMA nodes that they want this enabled on (since this can only be enabled on NUMA capable nodes and cannot be done generically)

Acceptance Criteria:

[ ] Ensure RHCOS node provisioned with NUMA enabled (can start with IBM Cloud bare metal machine PXE booted with Red Hat CoreOS)
[ ] Test automation steps. Should include scheduling a pod that is in the guaranteed QOS class and ensuring the appropriate placement is done.
[ ] Document automation steps in IBM Cloud Documentation. Can also see if hypershift upstream would be interested in associated documentation

relyt0925 commented 2 years ago

Potential example file that can help kickstart ideas in the area

apiVersion: v1
kind: ConfigMap
metadata:
  name: configurator
data:
  configure.sh: |
    #!/usr/bin/env bash
    set -x
    cp -f /scripts/ibm-numa-configuration.sh /host-usr-local-bin/ibm-numa-configuration.sh
    chmod 0755 /host-usr-local-bin/ibm-numa-configuration.sh
    cp -f /scripts/ibm-numa-configuration.service /host-etc-systemd-dir/ibm-numa-configuration.service
    chmod 0644 /host-etc-systemd-dir/ibm-numa-configuration.service
    nsenter -t 1 -m -u -i -n -p -- systemctl daemon-reload
    nsenter -t 1 -m -u -i -n -p -- systemctl start ibm-numa-configuration.service
  ibm-numa-configuration.sh: |
    #!/usr/bin/env bash
    set -x
    GIGABYTES_RESERVED_MEMORY=$(echo $SYSTEM_RESERVED_MEMORY | awk -F 'Gi' '{print $1}')
    TOTAL_NUMA_MEMORY_TO_ALLOCATE=$(echo "$GIGABYTES_RESERVED_MEMORY" "1024" | awk '{print $1 * $2 + 100}')
    # shellcheck disable=SC2154
    cat >/tmp/ibm-numa-config.conf <<EOF
    #START NUMA CONFIG
    topologyManagerPolicy: best-effort
    memoryManagerPolicy: Static
    cpuManagerPolicy: static
    reservedMemory:
      - numaNode: 0
        limits:
          memory: ${TOTAL_NUMA_MEMORY_TO_ALLOCATE}Mi
    #END NUMA CONFIG
    EOF
    sed -i '/#START NUMA CONFIG/,/#END NUMA CONFIG/d' /etc/kubernetes/kubelet.conf
    cat /tmp/ibm-numa-config.conf >>/etc/kubernetes/kubelet.conf
  ibm-numa-configuration.service: |
    [Unit]
    Description=Add numa config to kubelet
    Before=kubelet.service
    After=kubelet-auto-node-size.service

    [Service]
    Type=oneshot
    RemainAfterExit=yes
    EnvironmentFile=/etc/node-sizing.env
    ExecStart=/usr/local/bin/ibm-numa-configuration.sh

    [Install]
    WantedBy=multi-user.target
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: configurator
  name: configurator
spec:
  selector:
    matchLabels:
      app: configurator
  template:
    metadata:
      labels:
        app: configurator
    spec:
      nodeSelector:
        feature.node.kubernetes.io/memory-numa: "true"
      tolerations:
        - operator: "Exists"
      hostPID: true
      initContainers:
        - name: configure
          image: "registry.access.redhat.com/ubi8/ubi:8.6"
          command: ['/bin/bash', '-c', 'mkdir /cache && cp /scripts/configure.sh /cache && chmod +x /cache/configure.sh && /bin/bash /cache/configure.sh']
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /scripts
              name: script-config
            - mountPath: /host-etc-systemd-dir
              name: etc-systemd-dir
            - mountPath: /host-usr-local-bin
              name: usr-local-bin
      containers:
        - name: pause
          image: registry.ng.bluemix.net/armada-master/pause:3.2
      volumes:
        - name: etc-systemd-dir
          hostPath:
            path: /etc/systemd/system
        - name: usr-local-bin
          hostPath:
            path: /usr/local/bin
        - name: script-config
          configMap:
            name: configurator

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 1 year ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 1 year ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/hypershift/issues/1538#issuecomment-1374277302): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / hypershift

IBM Cloud Satellite: Outline supported workflow for user to enable CPU Pinning (CPU Manager), NUMA Aware scheduling (Memory Manager + Topology Manager) #1538