intel / kubernetes-power-manager

Apache License 2.0
76 stars 18 forks source link

unable to configure shared power profile per node via shared power workload #73

Open amshankaran opened 6 months ago

amshankaran commented 6 months ago

Hi, I have created two shared power workloads each one will use a different shared power profile while applying the shared workloads for two different nodes, frequency settings are overwritten by the last applied workloads power profile.

Profiles:

apiVersion: power.intel.com/v1
kind: PowerProfile
metadata:
  name: shared-node1
  namespace: intel-power
spec:
  name: "shared-node1"
  max: 2500
  min: 1500
  shared: true
  governor: "powersave"

apiVersion: power.intel.com/v1
kind: PowerProfile
metadata:
  name: shared-node2
  namespace: intel-power
spec:
  name: "shared-node2"
  max: 2600
  min: 1600
  shared: true
  governor: "powersave"

Workloads:

apiVersion: "power.intel.com/v1"
kind: PowerWorkload
metadata:
  name: shared-node1-workload
  namespace: intel-power
spec:
  name: shared-node1-workload
  allCores: true
  reservedCPUs:
    - 0
    - 1
  powerNodeSelector:
    kubernetes.io/hostname: node1
  powerProfile: shared-node1

apiVersion: "power.intel.com/v1"
kind: PowerWorkload
metadata:
  name: shared-node2-workload
  namespace: intel-power
spec:
  name: shared-node2-workload
  allCores: true
  reservedCPUs:
    - 0
    - 1
    - 2
    - 3
    - 4
  powerNodeSelector:
    kubernetes.io/hostname: node2
  powerProfile: shared-node2

After creating the power profiles I have applied the first workload shared-node1-workload

powernode

- apiVersion: power.intel.com/v1
  kind: PowerNode
  metadata:
    name: node1
    namespace: intel-power
  spec:
    nodeName: node1
    powerProfiles:
    - 'performance: 3600000 || 3400000 || '
    - 'balance-power: 2400000 || 2200000 || '
    powerWorkloads:
    - 'balance-power: balance-power || '
    - 'performance: performance || '
    sharedPool: shared-node1 || 2500000 || 1500000 || 2-63
    unaffectedCores: 0-1
- apiVersion: power.intel.com/v1
  kind: PowerNode
  metadata:
    name: node2
    namespace: intel-power
  spec:
    nodeName: node2
    powerProfiles:
    - 'performance: 3600000 || 3400000 || '
    - 'balance-power: 2400000 || 2200000 || '
    powerWorkloads:
    - 'performance: performance || '
    - 'balance-power: balance-power || '
    unaffectedCores: 0-63

After applying workload shared-node2-workload, the frequencies got changed for node1 as well and the shared pool got changed as shared-node2

powernode

- apiVersion: power.intel.com/v1
  kind: PowerNode
  metadata:
    name: node1
    namespace: intel-power
  spec:
    nodeName: node1
    powerProfiles:
    - 'balance-power: 2400000 || 2200000 || '
    - 'performance: 3600000 || 3400000 || '
    powerWorkloads:
    - 'performance: performance || '
    - 'balance-power: balance-power || '
    sharedPool: shared-node2 || 2600000 || 1600000 || 2-63   =======> got changed
    unaffectedCores: 0-1
- apiVersion: power.intel.com/v1
  kind: PowerNode
  metadata:
    name: node2
    namespace: intel-power
  spec:
    nodeName: node2
    powerProfiles:
    - 'balance-power: 2400000 || 2200000 || '
    - 'performance: 3600000 || 3400000 || '
    powerWorkloads:
    - 'balance-power: balance-power || '
    - 'performance: performance || '
    sharedPool: shared-node2 || 2600000 || 1600000 || 5-63
    unaffectedCores: 0-4

The reserved CPU configurations were applied properly, but the frequencies and the profile selection seems to be wrong. Can't we use specific shared profiles per node?

power-manager version: 2.3.1

adorney99 commented 5 months ago

Hi @amshankaran, we've replicated this issue and determined the cause is shared profiles overwriting the shared pools on nodes they aren't requested on. Unfortunately we're currently in a release cycle so won't be able to fix this in the upcoming release but we'll make sure this is fixed in the following release. For now clusters will be limited to using one shared profile at a time because of this. Thanks for bringing this to our attention and including the steps to reproduce

amshankaran commented 4 months ago

@adorney99 is there any plan for the fix. May I know in version it is expected to be fixed? Is this fixed in v2.4?

adorney99 commented 4 months ago

Hi @amshankaran, really sorry for the late response but I wanted to have a definitive answer for you. We have a fix for the issue and just need a release to include it in. There's a chance we'll be doing a small minor release where it can be included but it's more than likely that this won't be included until version 2.5 in May. Really sorry for the wait on this

kashifest commented 4 months ago

Hi @amshankaran, really sorry for the late response but I wanted to have a definitive answer for you. We have a fix for the issue and just need a release to include it in. There's a chance we'll be doing a small minor release where it can be included but it's more than likely that this won't be included until version 2.5 in May. Really sorry for the wait on this

It would be really great if the minor/patch release includes the fix sooner than May!

amshankaran commented 1 month ago

@adorney99 is this issue fixed it v2.5.0 ?

adorney99 commented 1 month ago

Hi @amshankaran yes the issue should be resolved in v2.5.0. Sorry for the long wait on a fix for this.

amshankaran commented 1 month ago

@adorney99 Thanks I'll test and confirm the behaviour.