Closed ramess101 closed 4 years ago
I just checked and the runs that I did while writing the tutorial do a time step in about 15 seconds, so the 200 steps take 50 min. That is on a Core i5-2400 @ 3.1GHz from 2011. So your run time seems to suffer indeed from some bottleneck.
The most obvious guess for the bottleneck is probably disk I/O. If you use a network-mounted file system with low bandwidth/latency for the "scratchdir", then the MOLCAS performance will probably suffer quite a lot. In my case, I used the local hard drive of my computer as the scratchdir. This works quite well because it is not affected by other users, unlike a large global cluster file system that is busy with dozens (or more) of other jobs. So you could try to setup the SHARC trajectories on the global file system, but as scratchdir you use some fast local storage. Don't forget that you can use environment variables when you setup the scatchdir location with setup_traj.py/within MOLCAS.resources.
If that does not help, then you can send me a zip containing all your output files (SHARC input, output.*, content of the QM/ directory, content of scratchdir). Email probably works best for this.
@maisebastian
OK, 50 minutes is a lot closer to what I was expecting. I will have to work with my system administrators to see if we can get to the bottom of this. I noticed that only ~10-20% of the CPU was actually being utilized at any given time, so I am going to look into that first.
I will get back to you later.
Thanks again
@maisebastian
I believe I got to the bottom of this.
1) As you suggested, I was using the wrong scratch space. I was mistakenly using our /scratch directory for "scratchdir" (you can understand the confusion) when there is actually a locally available /tmp directory that provides extremely fast I/O capabilities. Having changed scratchdir to /tmp, the tutorial calculations conclude in around an hour.
2) The low CPU usage was because the "top" command only refreshes every 3 seconds by default. This is misleading because SHARC calls many different modules that run for just a few seconds. Changing the top refresh rate to 0.5 seconds shows that each module is utilizing around 100% of a CPU, as desired.
Thanks for the guidance!
Closing issue
I apologize if GitHub Issue tracking is not the appropriate location for this type of dialogue. If there is a forum or group page where such questions should be posted instead, please let me know.
Now that I have been able to complete the tutorial, I am curious about the expected computer time required for each trajectory. I had originally anticipated that the calculation for a single trajectory from the tutorial would be quite fast for several reasons: the relatively short trajectory length (100 fs with 0.5 fs timesteps = total of 200 steps), the low level of theory (CASSCF), small basis set (cc-pVDZ), small molecule (CH2NH2), etc. Instead, each trajectory required approximately 12 hours to complete on a single CPU, Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz. Is this a typical run time for a single trajectory from the tutorial? If not, what might be the bottleneck? I would assume it is the QM calculations with OpenMOLCAS, in which case I am not sure how to improve performance.