Closed tdudgeon closed 9 months ago
I'm not sure efficient hardware usage is really in the scope of that example. I think the
simulation.context.setVelocitiesToTemperature(300 * openmm_unit.kelvin)
simulation.runForClockTime(1.0 * openmm_unit.minute)
cell is just mean to show that it can run and produce sane-looking results to start out a simulation.
That being said, there isn't any fundamental reason why one system maxes GPU utilization and a similar one doesn't. The different code paths should be making objects that only differ in minutiae like the particle parameters. Without CUDA hardware I can't do much more than i.e. guess something is wrong with the platform and OpenMM isn't seeing the GPU at all.
I can reproduce it locally with CUDA hardware. I could detect that the DCDReporter
was saving the trajectory too often (every 10 steps). This cripples the performance because it's doing more IO (saving the trajectory) than actually computing stuff, if you just increase the number of steps for writing the trajectory (say to something like 100 or 200), you should be able to see that the usage of the GPU goes up to ~90% (depending on the GPU of course).
I hope this helps answering the question, I don't see a real issue other than we might want to increase the number of steps in the reporter for the showcase.
Confirmed. The reporting interval is much too low.
Thanks Ivan! That definitely makes sense. I think the reason for such frequent writing is to ensure that a minute on a user's laptop produces enough frames to see that something is moving. I suspect your workstations hooked up to a GPU can still get a decent-looking trajectory with a more reasonable frequency. We'd want to run this on low-end hardware to see how many steps of that complex can run in a minute of wall time.
@mattwthompson Yeah, I suspected that's the reason as well. That makes sense.
I think the example could be updated with a brief explanation of why it was set to 10 and a suggestion that the user increase it to 100, 1000, or something like that if they have a GPU hooked up. This could probably be done automatically, I don't feel strongly about either approach.
In #1743 - likely to be merged within 12 hours, and included in a 0.14.5 release - I've made the following changes that should make the toolkit showcase a bit smoother in that cell:
openmm.Platform.getPlatformByName("CUDA")
does not raise an exception) in which case the stride is extended to 1000. I don't have an NVIDIA GPU wired up to my development setup, so this is just a guess. It's plausible this is still too low but I expect it should be an improvement over 10, which looks almost comical with the benefit of hindsight and this investigation.This should better utilize a GPU if one is found - and there should also be a nice improvement when one isn't found. It's pretty cool to find a fix that's wider in scope than the original problem.
Describe the bug When running the toolkit showcase notebook I'm seeing very low GPU utilisation. In contrast, when running a MD simulation using just OpenMM tooling (e.g. SystemGenerator and Modeller) close to 100% utilisation is seen. The force fields used are not the same (OpenMM system using amber/ff14SB.xml, gaff-2.11 and amber/tip3p_standard.xml), but either the force fields or the process of generating the OpenMM System using Interchange seems to result in very sub-optimal performance.
To Reproduce Run the notebook as described.
Output
nvidia-smi
shows GPU utilisation fluctuating, usually in single figures and never more than 15%Computing environment (please complete the following information): Ubuntu 23.04 GeoForce RTX 3060 GPU Browser: Chrome 116.0.5845.140 (Official Build) (64-bit) Python 3.11.5 Output of running conda list: