ur-whitelab / hoomd-tf

A plugin that allows the use of Tensorflow in Hoomd-Blue for GPU-accelerated ML+MD
https://hoomd-tf.readthedocs.io
MIT License
30 stars 8 forks source link

Simulations using automatic forces slower than expected? #335

Closed onehalfatsquared closed 3 years ago

onehalfatsquared commented 3 years ago

Hello again,

After getting the automatic force computations working, I am now trying to simulate my system with HOOMD-TF. I was using a CPU version of HOOMD and HOOMD-TF while doing my debugging, but installed a GPU version on my laptop yesterday to perform a few full simulations before possibly putting this on a cluster. There was a decent speed up in HOOMD in going from CPU to GPU, but HOOMD-TF remains about the same. I was wondering if this is expected or is something strange is happening.

I have the results from the HOOMD profiler for the same simulation, performed with and without HOOMD-TF:

HOOMD Average TPS: 5855.38 Simulation: 1.7078s | 100.000% Dump GSD: 0.0006s | 0.034% Integrate: 0.5754s | 33.694% Langevin step 1: 0.1740s | 10.186% Langevin step 2: 0.2178s | 12.756% Net force: 0.1443s | 8.450% Sum accel: 0.0000s | 0.000% Self: 0.0393s | 2.301% Neighbor: 0.2107s | 12.336% Cell: 0.0169s | 0.991% compute: 0.0048s | 0.279% init: 0.0101s | 0.593% Self: 0.0020s | 0.119% compute: 0.0299s | 1.750% dist-check: 0.1520s | 8.900% head-list: 0.0027s | 0.156% Self: 0.0092s | 0.539% Pair lj: 0.1087s | 6.366% Pair morse: 0.1928s | 11.287% SFCPack: 0.1993s | 11.670% constrain_rigid: 0.3817s | 22.351% sum force and torque: 0.1898s | 11.112% update: 0.1682s | 9.847% init molecules: 0.0084s | 0.492% Self: 0.1598s | 9.355% Self: 0.0238s | 1.392% Self: 0.0387s | 2.264%

HOOMD-TF Average TPS: 217.553 Simulation: 45.9657s | 100.000% Dump GSD: 0.0007s | 0.001% Integrate: 0.8174s | 1.778% Langevin step 1: 0.2257s | 0.491% Langevin step 2: 0.3383s | 0.736% Net force: 0.2002s | 0.435% Sum accel: 0.0000s | 0.000% Self: 0.0532s | 0.116% SFCPack: 0.2012s | 0.438% TensorflowCompute: 43.0852s | 93.733% Neighbor: 0.2545s | 0.554% Cell: 0.0168s | 0.037% compute: 0.0051s | 0.011% init: 0.0095s | 0.021% compute: 0.0380s | 0.083% dist-check: 0.1588s | 0.346% head-list: 0.0038s | 0.008% TensorflowCompute::Force Update: 0.0884s | 0.192% TensorflowCompute::reshapeNeighbors: 0.0920s | 0.200% TensorflowCompute::Awaiting TF Update: 41.9893s | 91.349% Self: 0.6611s | 1.438% TensorflowCompute::Awaiting TF Pre-Update: 1.2645s | 2.751% constrain_rigid: 0.5171s | 1.125% sum force and torque: 0.2954s | 0.643% update: 0.1930s | 0.420% init molecules: 0.0093s | 0.020% Self: 0.1837s | 0.400% Self: 0.0796s | 0.173%

Most of the time spent is spent "Awaiting TF Update", but I'm uncertain what that means. I am assuming that is some kind of communication between HOOMD and Tensorflow, which I thought would be quick if both are executing on the GPU. Does this indicate something is going wrong?

If it helps at all, this was performed for a system with 300 particles, but I notice similarly slow speeds for smaller (30 particle) and larger (3000 particle) systems. The GPU used is a GTX 1060M.

whitead commented 3 years ago

The "Awaiting" is the underlying GPU computation.

There are two reasons. The first is that HOOMD will always be faster for computing forces because its updates are hand-written kernels. The second is that unfortunately some of the CUDA operations used in Tensorflow are not accelerated on commodity chips, like complex indexing patterns. One thing you can try is to use XLA (I believe it's in documentation) which will try to simplify the computation graph. We turn it off by default because sometimes it can affect reproducibility but you can check with it on/off to compare. XLA can be enabled with a environment flag I think, double check tensorflow doc.

onehalfatsquared commented 3 years ago

Ok, that's fine as long as everything is running as intended.

I tried enabling the XLA flag, but the forces keep returning as NaNs. A google search revealed that XLA is not compatible with certain tensorflow functions, specifically the tf.where() function, which I use in the construction of my potentials.

Thanks for the speedy reply.