Poor GPU utilisation in toolkit example #1724

tdudgeon commented 10 months ago

Describe the bug When running the toolkit showcase notebook I'm seeing very low GPU utilisation. In contrast, when running a MD simulation using just OpenMM tooling (e.g. SystemGenerator and Modeller) close to 100% utilisation is seen. The force fields used are not the same (OpenMM system using amber/ff14SB.xml, gaff-2.11 and amber/tip3p_standard.xml), but either the force fields or the process of generating the OpenMM System using Interchange seems to result in very sub-optimal performance.

To Reproduce Run the notebook as described.

Output nvidia-smi shows GPU utilisation fluctuating, usually in single figures and never more than 15%

Computing environment (please complete the following information): Ubuntu 23.04 GeoForce RTX 3060 GPU Browser: Chrome 116.0.5845.140 (Official Build) (64-bit) Python 3.11.5 Output of running conda list:

mattwthompson commented 10 months ago

I'm not sure efficient hardware usage is really in the scope of that example. I think the

simulation.context.setVelocitiesToTemperature(300 * openmm_unit.kelvin)
simulation.runForClockTime(1.0 * openmm_unit.minute)

cell is just mean to show that it can run and produce sane-looking results to start out a simulation.

That being said, there isn't any fundamental reason why one system maxes GPU utilization and a similar one doesn't. The different code paths should be making objects that only differ in minutiae like the particle parameters. Without CUDA hardware I can't do much more than i.e. guess something is wrong with the platform and OpenMM isn't seeing the GPU at all.

ijpulidos commented 10 months ago

I can reproduce it locally with CUDA hardware. I could detect that the DCDReporter was saving the trajectory too often (every 10 steps). This cripples the performance because it's doing more IO (saving the trajectory) than actually computing stuff, if you just increase the number of steps for writing the trajectory (say to something like 100 or 200), you should be able to see that the usage of the GPU goes up to ~90% (depending on the GPU of course).

I hope this helps answering the question, I don't see a real issue other than we might want to increase the number of steps in the reporter for the showcase.

tdudgeon commented 10 months ago

Confirmed. The reporting interval is much too low.

mattwthompson commented 10 months ago

Thanks Ivan! That definitely makes sense. I think the reason for such frequent writing is to ensure that a minute on a user's laptop produces enough frames to see that something is moving. I suspect your workstations hooked up to a GPU can still get a decent-looking trajectory with a more reasonable frequency. We'd want to run this on low-end hardware to see how many steps of that complex can run in a minute of wall time.

ijpulidos commented 10 months ago

@mattwthompson Yeah, I suspected that's the reason as well. That makes sense.

mattwthompson commented 10 months ago

I think the example could be updated with a brief explanation of why it was set to 10 and a suggestion that the user increase it to 100, 1000, or something like that if they have a GPU hooked up. This could probably be done automatically, I don't feel strongly about either approach.

mattwthompson commented 9 months ago

In #1743 - likely to be merged within 12 hours, and included in a 0.14.5 release - I've made the following changes that should make the toolkit showcase a bit smoother in that cell:

This should better utilize a GPU if one is found - and there should also be a nice improvement when one isn't found. It's pretty cool to find a fix that's wider in scope than the original problem.