mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
171 stars 51 forks source link

Collect MaxRSS/Elapsed from sacct when using SLURM? #158

Open kendonB opened 6 years ago

kendonB commented 6 years ago

I imagine a wide problem in using HPC systems is not knowing how much memory or walltime to allocate. I imagine this not knowing the right amounts (of memory especially) causes huge amounts of wasted resources on HPC systems. I know it does in my workflow.

SLURM returns MaxRSS/Elapsed value for each job after completion which batchtools could store somewhere at the completion of jobs.

I'm eventually imagining a nice function exported from future.batchtools or drake that could report on these somehow.