stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
45 stars 15 forks source link

default node memory for slurm.conf set from ansible facts #81

Closed sjpb closed 3 years ago

sjpb commented 3 years ago

Currently node memory defaults to 1MB if ram_mb not set in role vars. This changes the default to the real free memory as given from ansible facts. Given the number of cpu cores etc is set from facts this seems uncontrovertial.

sjpb commented 3 years ago

Used for https://github.com/stackhpc/ansible_collection_slurm_openstack_tools/pull/16

jovial commented 3 years ago

Sorry, checked the slurm docs and it seems that RealMemory is in megabytes :-/

Size of real memory on the node in megabytes (e.g. "2048"). The default value is 1. Lowering RealMemory with the goal of setting aside some amount for the OS and not available for job allocations will not work as intended if Memory is not set as a consumable resource in SelectTypeParameters. So one of the *_Memory options need to be enabled for that goal to be accomplished. Also see MemSpecLimit.

(https://slurm.schedmd.com/slurm.conf.html)

So not sure if we need to do a conversion from base 2 to base 10. Any idea?

sjpb commented 3 years ago

I thought "megabytes" tended to get used generically for either unit, esp. in older docs? So is the 2048 example a hint that it's expecting base2?

We're not doing a conversion really, it's just that testing free vs ansible showed ansible was using mebibytes so wanted to document that.

Leave code as-is and just revert docs to say "megabytes"?

jovial commented 3 years ago

I thought "megabytes" tended to get used generically for either unit, esp. in older docs? So is the 2048 example a hint that it's expecting base2?

Think you right about that. From what I can tell, slurm seems to be using base2 internally: https://github.com/SchedMD/slurm/blob/23cbe39d98cfacd9434f10c19a415c6092e4c61c/src/slurmd/slurmd/get_mach_stat.c#L115 and referring to it as MB