koordinator-sh / koordinator

A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
https://koordinator.sh
Apache License 2.0
1.3k stars 322 forks source link

[proposal] Utilize PMEM hardware to implement a new memory policy when the system memory exceeds safety threshold #113

Open vivianzh opened 2 years ago

vivianzh commented 2 years ago

What is your proposal: On the machine with PMEM hardware, we can utilize this hardware capability, migrating BE pods memory to PMEM instead of evicting them directly when the system archives the memory safety threshold.

Why is this needed: PMEM can be used as the system memory, then we suggest to implement a differentiated memory policy for the machine with PMEM hardware, which can make full use of the hardware capability.

Is there a suggested solution, if so, please add it: When the memory utilization exceeds the safety threshold, we can choose to migrating BE pods memory to PMEM, BE pods can continue running on PMEM or temporarily freezing the CPU, which can reduce the memory pressure of LSR & LS pods. And when the memory utilization is far below the safety threshold, we can choose to migration BE pods memory back to DDR.

hormes commented 2 years ago

It is a great idea, thanks @vivianzh

What methods of operating PMEM are provided by the system, how to move a pods memory to PMEM?