Closed GeorgeGayno-NOAA closed 2 years ago
The system admins said to use this command to determine how much memory a job is using.
sacct -j 3908276 --format=jobid,jobname,state,alloctres%35,maxrss
Using the saact
command, I adjusted the requested memory for each test (b5d6ab6). Then I tested the updated script on Orion.
All tests were successfully run six times in a row. Previously, one test (of the 16) would always fail.
Occasionally, some of the chgres_cube tests will fail with a 'bus error'. The failures are random. The system admins recommend explicitly requesting how much memory each job needs in the driver script. For example
--mem=50G
. Preliminary tests show this solves the problem. (The default memory on Orion allocated by Slurm is 54GB).