stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
45 stars 15 forks source link

Add openhpc_ram_multiplier #93

Closed jovial closed 3 years ago

jovial commented 3 years ago

Using total memory as value of RealMemory in slurm.conf does not allow for OS overheads and can cause slurm to srain the nodes with: LowRealMemory.

Fixes #92.

sjpb commented 3 years ago

@sjpb to run thro on labs openhpc_tests, and merge if OK.

sjpb commented 3 years ago

Testing:

[centos@*-hpc-0 ~]$ slurmd -C
NodeName=*-hpc-0 CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=128616
UpTime=1-05:31:27
[centos@*-control ~]$ grep RealMemory /etc/slurm/slurm.conf
    RealMemory=122185 \
    RealMemory=122185 \

So can see default 0.95 multiplier has been applied.

Now running:

ansible-playbook ansible/adhoc/test.yml -e "openhpc_tests_nodes=*-hpc-[0-3]"

...

sjpb commented 3 years ago

Ran openhpc_tests ok, nodes still up afterwards so lets merge this @jovial.

jovial commented 3 years ago

Ran openhpc_tests ok, nodes still up afterwards so lets merge this @jovial.

Your wish is my command - thanks for testing :)