sr320 / course-fish546-2018

7 stars 2 forks source link

Keeping track of slurm.out files #91

Closed kimh11 closed 5 years ago

kimh11 commented 5 years ago

In an attempt to get result before today, I started a number of jobs on Mox. And now I have a bunch of slurm.out files. slurm-490043.out slurm-490303.out slurm-490304.out slurm-490307.out slurm-491153.out slurm-492049.out

Is there a good way to match these to the slurm script? Or a way to name the .out files in the script itself?

I've been doing a variation of these:

  1. Copy and paste the job number to a separate document (when I remember, which is rarely)
  2. Search my email for the job number
  3. Head/tail to find a hint of which script it was
  4. ls by reverse time to figure out which cluster of output files it belongs to

There's got to be a better way!

Thank you!

sr320 commented 5 years ago

I suggest for every job you run you have one output directory (working directory)

see https://github.com/RobertsLab/hyak_mox/wiki/Data-Storage-&-System-Organization#suggested-user-organization

This way you only have one slurmout in a directory and you only have to keep track of slurm job scripts to know what you did and where the output is.

kubu4 commented 5 years ago

There are a couple of approaches:

  1. Organizationally - Create a new working directory every time you submit a job. Then, each working directory will only contain a single slurm file.

  2. Computationally - Direct output to a file of your choosing. Here's an email from UW IT with a suggestion on how to do just that:

You can always redirect your scripts output separate from the schedulers output file. The schedulers output file can be modified and >is documented through man sbatch or https://slurm.schedmd.com/sbatch.html.

To redirect standard output of the script there are a couple of options

out="tee -a slurm--$(date +%Y%m%d%H%M%S).out" (This results in a file in the working directory formatted like >slurm-258698-20180815092524.out with the first number being the slurm job ID number and the second number being the data in >YYYYmmDDHHMMSS format. This format is my preferred format as 'ls' will pre-sort it automatically with the oldest file being last and no >special characters that need escaping on the command line when accessing. )

or

out="tee -a $(squeue -h -j $SLURM_JOB_ID -o slurm-%i-%S.out)" (This will result in the file named similar to >slurm-116691-2018-08-15T08:40:02.out.)

{

} | $out This will redirect standard out to the file in the $out variable. You can also change the path of the of the file to a common output >file directory if you want as well. All the standard error output will still go to the default slurm-jobid.out file. - Adam
sr320 commented 5 years ago

And you do know slurm output is ordered numerically so you can easily determine order of slurm submissions

kimh11 commented 5 years ago

Thank you, Sam and Steven! I'll try both! :)

Yup..but I never seem to remember which job I submitted first... 😄

sr320 commented 5 years ago

That why I name all my job files as MMDD-HHMM! On Dec 6, 2018, 1:02 PM -0800, HJ Kim notifications@github.com, wrote:

Thank you, Sam and Steven! I'll try both! :) Yup..but I never seem to remember which job I submitted first... 😄 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

kimh11 commented 5 years ago

🤦‍♀️ Yeah...that's me right now. Especially since I've read that Github page before.

Thank you for the answers!