stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
45 stars 15 forks source link

filetxt storage locations not created if not default #102

Closed sjpb closed 3 years ago

sjpb commented 3 years ago

In openhpc v2.0 / slurm 20.02.5 the filetxt storage type can be used for either/both accounting storage and job completion storage. Locations for these are set by AccountingStorageLoc and JobCompLoc respectively. In openhpc v2.1 /slurm v20.11.3 filetxt storage can only be used for job completion storage. Both versions support DefaultStorageLoc which can fill in for the other *StorageLoc parameters.

As of v0.7.0 the situation with this role is as follows:

If accounting storage type is set to none and job completion is set to filetxt, slurmctld dies on startup with a permissions error for /var/log/slurm_jobacct.log. Changing the JobCompLoc to be /var/log/slurm_jobcomp.log doesn't help, and the file must be "manually" created before slurmctld starts.

If AccountingStorageLoc is manually added to the slurm.conf template, with a non-default location, then slurmctld similarly dies similarly on startup too. So it appears that there is some special-case creation for /var/log/slurm_jobacct.log, but only if it is used for accounting storage, not as job completion storage.

Refs:

sjpb commented 3 years ago

Given AccountingStorageLoc can't currently be controlled from the role, and goes away in ohpc v2.1 anyway, we don't need to worry about that. I think we should fix it though so openhpc_slurm_job_comp_loc / JobCompLoc is guaranteed to exist if openhpc_slurm_job_comp_type / JobCompType is set to jobcomp/filetxt. Otherwise the user has to do that before running the role.

sjpb commented 3 years ago

Next question is whether the role's default for JobCompLoc should be changed to /var/log/slurm_jobcomp.log or not. I suggest not given that is another change, and the docs for DefaultStorageLoc suggest using the same file for both accounting and job completion storage is ok anyway.