running moltemplate on large systems is slow and needs a lot of memory

hnadeem2 commented 2 years ago

Hi andrew,

I have a martini based coursegrain simulation in which the size of the box is large. There are 3 species of atoms totaling around 12million atoms. Moltemplate always crashes through the run and it takes forever, is it possible it on multiple cores with mpi/mpiexec.

Hassan

jewettaij commented 2 years ago

Hi Hassan There is currently no way to run moltemplate in parallel. I think it is likely that memory limitations are the main problem you are having. Moltemplate requires between 3GB up to 12 GB of RAM per 10^6 atoms in your simulation. This is very wasteful, but it's difficult to fix. (See historical details below.) The large memory used by moltemplate also contributes to the long time it takes moltemplate to run. (Computers run slow when running low on RAM and using swap.) To work around both of these issues, you have two options: 1) Here's a weird hack which can reduce memory usage and time: If possible, divide your system into smaller pieces. For example, divide your simulation into 8 identical pieces, each of which is half as large in the X,Y,Z directions. Use moltemplate.sh to create LAMMPS files describing one of these small pieces ("system.data" and "system.in.init" and "system.in.settings", and "system.in.charges" if present). Then use ltemplify.py to convert these files into a single LT file (eg "subsystem.lt". Then move your original files into another directory and create a new "system.lt" file which makes 8 copies of this "subsystem.lt". (I'm passing out now, but if I remember tomorrow, I'll edit this post to add some more details how this is done.) If you run out of memory again during this step, try using TopoTools to load the DATA file for the subsystem and duplicate it. (Topotools might be a lot faster than moltemplate as well.) 2) Alterntively, rent computer time from a machine with at least 128GB of RAM (preferably 256GB). Such computer can be rented from Amazon). (If I remember correctly the price was about a dollar per hour, and cheaper if you use "spot pricing") 3) You can reduce the time it takes to run moltemplate by running moltemplate.sh with the "-nocheck" argument. This will not reduce memory usage, but it should reduce the time it takes to run moltemplate by roughly a factor of 2. Unfortunately this also disables syntax-error messages which are very useful when you are designing your simulation. So only use this argument when you are certain there are no errors in your LT files. If you do this, I suggest you build a much smaller version of your system first and make sure there are no errors. Then when you are ready to build the full-size system, run moltemplate with the "-nocheck" argument.

History

Unfortunately, when I wrote moltemplate, I was not aware of the large memory overhead required by python. In python, every object you instantiate (including a tiny molecule in a simulation with a single atom) requires about a kilobyte of memory. (After some effort, I think I was able to bring this down to about 300 bytes using "slots"). I started out running simple, small coarse-grained simulations using moltemplate. I was not thinking of running huge simulations at the time. But moltemplate has grown more than I thought. I am honestly flattered that moltemplate has been so successful that these kinds of questions even come up. But I'm sorry it creates headaches for you.

jewettaij commented 2 years ago

I just re-read your message and I noticed you are using the MARTINI force field. I'm curious to know how you prepared your simulation. Did you use the MARTINI 2 files that come with moltemplate. Or did you download files from the MARTINI (3) web site (in gromacs format) and convert them into moltemplate format? (I feel somewhat guilty that I have not provided more support for MARTINI users. Eventually, I'd like to write a script to convert the most recent GROMACS files into moltemplate format. If you have such a script, please share it.)

hnadeem2 commented 2 years ago

Apologies for the late response. The easiest solution for me was to run it on a machine with 128GB of RAM and it took some time but it ran well. Actually I'm using MARTINI 3 downloaded from the website but although my system size was large, the largest molecule was only course-grained to 4 cg atoms, so I built the .lt files by hand (got lazy). If my next system is more complex, I will definitely write something for that.

jewettaij commented 2 years ago

Thank you very much Hassan for getting back to me. If I have a chance to work on an ITP file converter that might benefit MARTINI users, I'll let you know. Take care -Andrew

jewettaij commented 2 years ago

I think I'll reopen this in case anyone else has the same question.

jewettaij / moltemplate

running moltemplate on large systems is slow and needs a lot of memory #82

History