qusers / qligfep

Other
47 stars 22 forks source link

How to monitor performance/job completion? #4

Closed GMdSilva closed 3 years ago

GMdSilva commented 3 years ago

Hi, first of all thank you so much for your amazing work in putting this together along with the documentation and tutorials, they helped me a lot.

I am running the FEP tutorial, and everything seems to be working. I have successfully completed the water leg of the simulations and I have been running the protein leg for 12+ hours on my unis cluster (using 32 cores per each of the 10 jobs).

I was wondering if there was a way to monitor what stage the job is currently in (i.e., eq. or which of the md stages) or some sort of performance metric, so I can track how long it will take to run. The files in my FEP1/298/[1-10] have not updated since submitting the run, but my slurm_[x].log shows no errors and the jobs are apparently running exactly as they did for the water simulations (although that was much faster, naturally).

@Edit: I just figured out I was using the wrong MPI version in the protein simulation, so the software was not even running. Using the right MPI version allows me to track the files being updated in the folder. Sorry for the confusion!

Thank you again

jesperswillem commented 3 years ago

Thanks for getting back to us, I'm closing the issue now.