markovmodel / deeptime

Deep learning meets molecular dynamics.
GNU Lesser General Public License v3.0
175 stars 39 forks source link

training very slow on GPU #35

Open jiayeguo opened 4 years ago

jiayeguo commented 4 years ago

Hi I am trying to reproduce your results in Alanine_dipeptide_multiple_files on a single NVIDIA GeForce GTX 1080 Ti GPU and it took ~ 5h to finish all 10 attempts. I was using tensorflow-gpu v1.9.0, cuda/9.0 and cudnn/7.0. As comparison, I also ran the jupyter-notebook on my laptop CPU and it was faster than GPU (~ 3h, but still very slow!). In the Nature Comm. paper, you mentioned that depending on the system, each run takes between 20s and 180s. Since I didn't change the code, I am wondering why there's such a big discrepancy in speed compared to the paper. Do you have any insight on why my training is so slow? Thanks!

amardt commented 4 years ago

Hi, the reason for the slow speed is, that in this notebook we don't load the data into memory before training. Instead it is loaded for every batch from the hard drive. This is supposed to simulate the situation, where the whole dataset does not fit into memory. However, reading from the hard drive is slow and if you are using the GPU, it also has to be transferred to that, which I guess is the reason why it is even slower on your desktop. The time is consumed by loading and transferring data. For the paper we simply used only one trajectory and loaded it into memory before training (see the notebook without multiple files). Anyhow, a colleague of mine is developing a new library with the implementation of VAMPnets in Pytorch, which will be more up-to-date. I will post a link here as soon as it is released. I hope this answers your question! Best Andreas

jiayeguo commented 4 years ago

Thanks for the clarification! That makes sense. Looking forward to trying out the PyTorch version. Best, Jiaye

clonker commented 4 years ago

Hi Jiaye,

colleague developing the new library here. Coincidentally it is also called deeptime. If you are feeling adventurous and want to play around with it, you can find it here: https://github.com/deeptime-ml/deeptime (and documentation for vampnets in the new deeptime) I have set up a small notebook for you demonstrating how you can use it to train vampnets. Training takes 2 - 2:30 min on my machine for 60 epochs. There are two training routines, the 2:30 one is more top-level and easier to implement, the 2:00 min one is more optimized for data that can be held in memory in their entirety.

Cheers, Moritz

ala2-vampnets.zip

jiayeguo commented 4 years ago

Hi Moritz! Thanks for pointing me to this new repo. I will take a look and play around with it. Best, Jiaye