treydock / puppet-slurm

Other
2 stars 12 forks source link

slurm::greses does not seem to work as expected #60

Open JacobJanzen opened 6 days ago

JacobJanzen commented 6 days ago

While trying to use this module, I noticed that gres values are not being populated when declared in slurm::greses. For testing purposes, I used the common.yaml specified in the documentation without the line that specifies a repo base url. Giving the following:

munge::munge_key_content: "supersecretsupersecretsupersecretsupersecretsupersecret"
slurm::install_torque_wrapper: true
slurm::install_pam: true
slurm::slurm_group_gid: 93
slurm::slurm_user_uid: 93
slurm::slurm_user_home: /var/lib/slurm
slurm::manage_firewall: false
slurm::use_syslog: true
slurm::cluster_name: example
slurm::slurmctld_host:
  - slurmctld.example.com
slurm::slurmdbd_host: slurmdbd.example.com
slurm::greses:
  nvml:
    auto_detect: nvml
slurm::slurmd_spool_dir: /var/spool/slurmd
slurm::slurm_conf_override:
  AccountingStorageTRES:
    - gres/gpu
    - gres/gpu:tesla
    - license/ansys
  Licenses:
    - ansys:2
  ReturnToService: 2
  SelectType: select/cons_tres
  SelectTypeParameters:
    - CR_CPU
slurm::partitions:
  batch:
    default: 'YES'
    def_mem_per_cpu: 1700
    max_mem_per_cpu: 1750
    nodes: slurmd01
slurm::nodes:
  slurmd01:
    node_hostname: slurmd01.example.com
    cpus: 4
    threads_per_core: 1
    cores_per_socket: 1
    sockets: 4
    real_memory: 7000

Puppet runs without error, but upon logging into the server, I find that there are no generic resources defined in /etc/slurm/slurm.conf nor is there any file such as /etc/slurm/gres.conf that may contain the resource definition. To confirm that the resource is just being put in some place that I haven't found, I tried changing the name of the resource which should have logged something upon running Puppet again. Puppet did not change anything though.

treydock commented 5 days ago

You need to set

slurm::include_resources: true
JacobJanzen commented 18 hours ago

Thanks for the help! That option seems to require a slurm_config resource type which I cannot find any information about. I see that the only reference to that type is here. Is there some dependency that I might be missing that provides the type?