intel / kubernetes-power-manager

Apache License 2.0
76 stars 18 forks source link

Cstates logic to define and enable multiple states for a profile #69

Closed sahanirn closed 2 months ago

sahanirn commented 8 months ago

@adorney99 Is it possible to define and enable multiple states for one single profile or shared pool? For ex, enabling multiple states for shared pool as below: sharedPoolCStates: C1: true C6: true exclusivePoolCStates: performance: C1: true C6: false The observation with the above config was - when a pod was deployed with performance profile, the cores allocated to the pod, remained in the C6 state and did not transition from C6 to C1. When the cstate configuration for the sharedpool was changed from C1 true to C1 false, then the performance pods transitioned to C1 state. Is this the expected behavior? Can't we enable multiple states for shared pool? Also, is it possible to enable multiple states for individual profiles?

adorney99 commented 8 months ago

Hi @sahanirn yes you can set all available cstates for a single profile or shared pool. Its even encouraged since disabling a reduced speed cstate such as C1E but enabling a full sleep cstate like C6 doesn't make much sense (although as far as I know C6 will still be treated as disabled in this scenario but its best to properly disable it to be safe).

I haven't been able to reproduce the issue you're seeing here as when I tried this I saw cores transition out of C6 once they were placed in the performance pool. You can make sure the manager is working correctly if you check /sys/devices/system/cpu/cpu/cpuidle/. In this directory will be subdirectories corresponding to each cstate. You can check if the cstates are set correctly by looking at the name and disable files.

My only guess as to what's causing the issue you're seeing is if you have hyperthreading enabled and are scheduling pods with an odd number of logical cores. Having different cstates on core siblings is a bit of a grey area and while I've never seen it happen a core in the shared pool may be influencing the cstates of its sibling in the performance pool. That aside we recommend grouping pods with even core numbers or disabling hyperthreading because core siblings can only have one shared frequency which causes problems when they're in different pools

sahanirn commented 8 months ago

@adorney99 Thanks, for your response. Also, I have observed one issue, when we reserve 2 cores as reservedSystemCpus and when we deploy pods with any profile and any number of cores, all cstates, pstates work as expected, but when I reserved more than 2 cores, for ex, 4,5,6,...above 2 cores, the pod gets deployed with the specified profile but the core allocation to the profile was not proper. What could be the reason for this behavior.

image

Here, after describing powernode, the powercontainer shows the exclusive cpus but if you see the profiles, there are no cores shown. Also, I verified by checking the max min frequencies of the exclusive cores shown here but they remain in the shared pool frequency range instead of performance profile frequency. Would like to know if any factor affecting this behavior.

adorney99 commented 8 months ago

@sahanirn Haven't seen behavior like that before but my best guess would be that the extra cores you're reserving aren't reserved by the kubelet meaning they can still be scheduled (I'm assuming cores 0-1 are reserved but 2-3 aren't since they show up in the exclusive cpus list). This can lead to undefined behavior because cores get allocated that the power manager has been instructed to not operate on because they're supposed to be system reserved