intel / platform-aware-scheduling

Enabling Kubernetes to make pod placement decisions with platform intelligence.
Apache License 2.0
170 stars 44 forks source link

configure-scheduler.sh script breaks Kubernetes config #126

Open eero-t opened 1 year ago

eero-t commented 1 year ago

Updating Kubernetes scheduler config using the script & (GAS) config in this repo, breaks Kubernetes 1.26 config: ./telemetry-aware-scheduling/deploy/extender-configuration/configure-scheduler.sh -f ./gpu-aware-scheduling/deploy/extender-configuration/scheduler-config-tas+gas.yaml

It messed up the last volume specification in the config so that there were couple of extra lines for the last volume specification.

Besides fixing that in the script, I think script should take a backup of the config file, and show user a diff of the changes it did.

Maybe also ask user whether changes should be accepted (unless something like "--yes" is specified), like e.g. Debian configuration packages script updates do.

You might also consider using kustomize to do the file updates, as that is actually designed for semantic updating of k8s YAML files.

(Which still leaves the issue of upgrade overwriting the changes, as mentioned in https://github.com/intel/platform-aware-scheduling/issues/86.)

togashidm commented 1 year ago

Hi @eero-t

Thank you for flag this and for your suggestions. We are looking into them.

madalazar commented 1 year ago

Hi @eero-t would you be able to share both the output of the configure-scheduler.sh run (the output of the command you pasted above) and resulting kube-scheduler.yaml output? I'm expecting the script changed the file in a way that the scheduler would not start anymore. If that's not available, could you share the current kube-scheduler manifest? I would like to understand what was the problem in the first place and without it it's pretty much impossible.

As for the other suggestions, I have added them to our backlog. Will let you know when they will be picked up

eero-t commented 1 year ago

As script did not provide a backup (and I was not the person updating that cluster to k8s 1.26), I do now know what the starting point was.

If I run the command now (with fixed config), it outputs following:

# ./telemetry-aware-scheduling/deploy/extender-configuration/configure-scheduler.sh -f ./gpu-aware-scheduling/deploy/extender-configuration/scheduler-config-tas+gas.yaml
Manifest file is located at: /etc/kubernetes/manifests/kube-scheduler.yaml
Scheduler config file is located at: ./gpu-aware-scheduling/deploy/extender-configuration/scheduler-config-tas+gas.yaml
Scheduler config destination will be: /etc/kubernetes
Version of the image used in the kube scheduler is: 26
Version of the KubeScheduler API: v1
Will proceed to copy the scheduler configuration to its destination path: /etc/kubernetes.
Determined that the appropriate nodeSelector key would be: control-plan

And does not change kube-scheduler.yaml any more.

Which looks now like this:

...
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler-config-tas+gas.yaml
    name: schedulerconfig
  - hostPath:
      path: /etc/certs
    name: certdir
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig
status: {}

After the initial run of the script, there were additional path: & type: lines for the last (kubeconfig) volume, in addition to path: & type: lines you see above.

madalazar commented 1 year ago

Hi, I tried to use the same command as you did and this was my output:

Command

./telemetry-aware-scheduling/deploy/extender-configuration/configure-scheduler.sh -f ./gpu-aware-scheduling/deploy/extender-configuration/scheduler-config-tas+gas.yaml

Initial manifest file

    - mountPath: /etc/kubernetes/scheduler-config.yaml
      name: schedulerconfig
      readOnly: true
    - mountPath: /host/certs
      name: certdir
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler-config.yaml
    name: schedulerconfig
  - hostPath:
      path: /etc/certs
    name: certdir
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig

Output

  volumeMounts:
    - mountPath: /etc/kubernetes/scheduler-config-tas+gas.yaml
      name: schedulerconfig
      readOnly: true
    - mountPath: /host/certs
      name: certdir
    - mountPath: /etc/kubernetes/scheduler-config.yaml
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler-config-tas+gas.yaml
    name: schedulerconfig
  - hostPath:
      path: /etc/certs
    name: certdir
  - hostPath:
      path: /etc/kubernetes/scheduler-config.yaml
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig

The problem I found is here :

-- hostPath: path: /etc/kubernetes/scheduler-config.yaml path: /etc/kubernetes/scheduler.conf type: FileOrCreate name: kubeconfig

From what I could tell the root cause is this line https://github.com/intel-innersource/libraries.orchestrators.resourcemanagement.platform-aware-scheduling/blob/master/telemetry-aware-scheduling/deploy/extender-configuration/configure-scheduler.sh#L162

sed -i "/$scheduler_config_file/d" "$MANIFEST_FILE"

I started with the default manifest file (deploy/extender/scheduler-config.yaml), but when I specified a new one -f ./gpu-aware-scheduling/deploy/extender-configuration/scheduler-config-tas+gas.yaml, the $MANIFEST_FILE would be scheduler-config-tas+gas.yaml not scheduler-config.yaml so sed can't find and won't remove it.

Not sure if this is exactly the same problem as in my case, the scheduler was still working (the scheduler pod started successfully, and then I was able to schedule the demo pod according to the TAS health-demo policy). I created a follow-up item for us to work on to address this particular problem.

In terms of priorities, I think we will be starting with 2 items first: fixing the bug above and having the file create backups as I think these are more urgent. I'll be picking these up in the next sprint (following week). I will update this issue once they will be released.

eero-t commented 1 year ago

IMHO YAML modifications should be done with something that actually understands YAML (e.g. Python script using pyyaml module), otherwise it's too easy for the script to break things.

madalazar commented 1 year ago

Hi,

Apologies for the late reply. I logged an issue in our backlog about this problem (it was also referenced here https://github.com/intel/platform-aware-scheduling/issues/86), but we weren't able to pick-up the work so far. I'm working on a new release for TAS, but unfortunately this change did not make it.

My plan is to look this week/Monday at what changes we can work on in March and then what we plan for the next quarter. I'll be able to give a more specific date then.

Just to keep things clear, I will continue to update this issue regarding the changes required for the configuration script and just update issue#86 with regards to your last request.