Closed DusanJovic-NOAA closed 6 months ago
Hi, @DusanJovic-NOAA . My RT builds are running under the normal partition. I'm using the epic account so it could be something to do with the account you're using perhaps, nggps_emc I assume?
Maybe it's because of the account I'm using, I'm not sure. But It think the queue should be explicitly set in the compile job card template, now it is commented out:
$ grep QUEUE compile_slurm.IN_gaea
##SBATCH --qos=@[QUEUE]
I do not know why it is commented in the compile job card template but not in the run job card.
Maybe it's because of the account I'm using, I'm not sure. But It think the queue should be explicitly set in the compile job card template, now it is commented out:
$ grep QUEUE compile_slurm.IN_gaea ##SBATCH --qos=@[QUEUE]
I do not know why it is commented in the compile job card template but not in the run job card.
Ok, I will try setting it explicitly to "normal" in the compile job card and see how it goes when using nggps_emc.
@DusanJovic-NOAA so if I set qos to normal explicitly in compile_slurm.IN_gaea it fails if I'm using nggps_emc, with an invalid qos error. However, when using epic it works fine. Something about nggps requires qos to use windfall for builds. I've reached out to Gaea admins for insight.
@DusanJovic-NOAA I have not received a response back from Gaea yet, however I managed to get nggps_emc to compile within the 'normal' queue. I set cluster=c5 and partition=batch in compile_slurm.IN_gaea. I just think there is some account setting for nggps_emc to use windfall when ,clusters and partition are set to es and eslogin_c5, respectively. I'm not certain if there would be broader implications for setting these to c5/batch in compile_slurm.IN_gaea.
Unless there are any protests, I will close this issue as I don't think there are any code changes to be made here.
So will this line:
still be commented out?
If queue is not commented out the compilation will fail when using nggps_emc, because the queue will default to 'normal' as set by rt.sh for gaea, and it doesn't look like nggps_emc has access to the normal queue when on the es cluster/eslogin_c5 partition combination. So cluster and partition would also need to be changed to c5/batch.
If sysadmins are okay with us running compile jobs on login nodes then fine.
@DusanJovic-NOAA response from Gaea admins: The login nodes and, or the eslogin_c5 partition are intended for compilation and the maximum # of nodes allowed per job is one. You should be fine continuing to compile on the login nodes.
On Gaea all build regression test jobs are running on login nodes under windfall queue:
Is this intended?