Closed donahuem closed 5 years ago
Don't pull .... we'll set up some new ones.
Sent a new set of regular runs. Each job is a 50-task array. Each task is 100 runs. 58137334 58137380 58137427 58137498
These ran just fine as indicated by std.out which records all 100 runs in each file. However, only a subset of runs were written to summaryOut or to BfinN, etc. I assume that scratch ran out of space? Because the warnings were not writing out, I'm not sure.
Solutions: write out warnings (option(warn=1) added to Phenology.R); turn writeBout to 0 for now. Can go back and rerun for wihtin-year dynamics if its critical
New set of runs. 4 sets of runs; 50 tasks of 100 runs
58378409 58378458 58378477 58378492
@lizzieinvancouver You can pull these runs. Note that the number of runs per task ranges from 75-100 b/c of memory issues
About 85% of the runs in each task were saved. I think I have exceeded the memory allocation in the jobs. The runs vary in time from ~1:30 to 3:30, but all have MaxRSS ~= 100Mb. While I thought i was asking for 1000Mb per run, 100Mb is the default. I think that the white spaces in my slurm job script might mean that those additional specs (i.e., mem=1000) were not included in call, and we got the default memory. Just a guess. Testing this by sending another batch.
Next batch of runs! 58550919 58550988 58551048 58551123
These runs are complete. Not all the runs were saved (~85%). Checked ReqMem (requested memory) and it is 1000Mn for these and all preceding jobs. So much for that idea.
I had assumed that scratch was separate for each node and named the scratch folder by the jobID. If it is shared, then the first job to finish might be moving all files in the jobID folder. Instead, name folder by jobID-taskID. Rerunning: Submitted batch job 58581670 Submitted batch job 58581688 Submitted batch job 58581745 Submitted batch job 58581780
These ran all the way through.
Sent a new set of regular runs. Each job is a 50-task array. Each task is 200 runs. 54202087 54202148 54202210 54202241
Megan should check on these in the morning to make sure they are running as expected.