sntioudis / papreca

PAPRECA hybrid off-lattice kinetic Monte Carlo/molecular dynamics (kMC/MD) simulator
GNU General Public License v2.0
6 stars 0 forks source link

Comments to paper #15

Closed jfaraudo closed 5 months ago

jfaraudo commented 5 months ago

The paper looks nice. I have only a few comments:

jfaraudo commented 5 months ago

I have another comment about the JOSS draft paper. After reading the submitted paper, it was not obvious to me why kMC and MD were combined in the way you did here. After reading your previous paper in Computational Materials Science, vol. 229, 2023, that was obvious to me. You mention clearly in the pevious paper that the mission of the MD is only to take into account high frequency / short time effects such as molecular vibration, small changes in bond distances and molecular flexibility and thinks like that, whereas the main kinetics of the effects to be studied were considered using kMC from "frozen" MD configurations. I think that a short explanation in this line, like the one in your paper (for ex. in page 3 below the flow chart), will be a great addition to the Statement of need section of the JOSS paper.

sntioudis commented 5 months ago

@jfaraudo

The authors mention at several points the generality of the method, and in the statement of need, they include examples such as protein folding or enzyme reactions that require consideration of solvent, in order to be simulated properly. I am not completelly sure that the present method works efficiently in presence of solvent. In my experience, explicit solvent makes MC methods quite inefficient (pls note that I have still to run the examples, so I admit that I can be wrong). For me it is clear that in situations such as solid state simulations , molecules in contact with solids and thinks like this the method is perfect but I think that a comment whether this method is recommended or not in simulations with explicit solvent is in order.

I took a couple of days to think about this and I believe that the presence of solvent does not directly affect the performance of kinetic Monte Carlo codes. Specifically, the kMC stage of PAPRECA definitely becomes more expensive if:

1) The number of atoms increases. If this happens then more event searches have to be performed. 2) The number of predefined events increases. In this case, once again, the number of event searches rises. 3) The number of atoms included in the neighbors lists elevates. In essence, this would mean that the system is "denser" which tranlates to more expensive event searches (e.g., when we are looking for bond-pairs or checking for collisions in diffusion or adsorption events).

I reckon that the presence of solvent can make the kMC stage of the simulation more expensive, but indirectly. In other words, explicitly including the solvent introduces atoms to the system and potentially the neighbors lists. Nevertheless, simulating a very dense solid/vapour interface might become computationally demanding for the exact same reasons (i.e., if the film is too "dense" or the interface/surface area is too large).

I admit that the MD stage of a hybrid PAPRECA run (i.e., kMC+MD) should be impacted in some way by the presence of solvent. Nonetheless, this is a LAMMPS topic and not necessarily something relevant with PAPRECA or kMC simulations including solvents.

I performed the following simulations as a sanity check.

1) Unsolvated tri-m-cresyl molecules in a box (5452 atoms).

tcp

Final stats: Total KMC walltime=        11.27842874 sec (0.60280958%) 
             Total MD walltime=         1859.69859454 sec (99.39719042%) 
             Total HYBRID KMC/MD time=  1870.97702328 sec

TCP_unsolvated.zip

2) Mixture of tri-m-cresyl molecules and toluene (solvent) molecules (5440 atoms).

tcp_toluene

Final stats: Total KMC walltime=        4.20634362 sec (1.52989671%) 
             Total MD walltime=         270.73663669 sec (98.47010329%) 
             Total HYBRID KMC/MD time=  274.94298030 sec

TCP_TOLUENE.zip

Note that both systems were simulated using the same processor gri( 2x2x4), the exact same predefined events, simulation box settings, and interatomic potential. Even though both simulations had (almost) equal number of atoms, we observe that the kMC stage of the unsolvated system was about 3 times faster than the relevant stage of the solvated system. To the best of my understanding, this can be attributed to the fact that the unsolvated system formed agglomerates, elevating the number of atoms in the neighbors lists. In turn, this led to a rise in the number of "neighboring atom searches" of the bond formation events defined in the system (i.e., P-O, O-O, and C-O). Conclusively, explicitely including the solvent atoms in a system does not guarantee that the computational cost of the kMC stage (or even the MD stage in some cases) would rise.

I am attaching compressed files with the input files, in case you want to run this.

jfaraudo commented 5 months ago

Thank you for considering these tests!. Traditionally MC is not very popular with people working with systems in solvent because for most solvents (especially for water) it is very hard to find "cavities" able to allocate a molecule jumping from one place to another one. I mean, if you have for example a molecule diffusing from one place to another in solvent, to simulate this motion with MC you need a low energy place to put this molecule without overlaping the solvent. Overlaping the solvent will have an incredibly large energy cost in a MC move , so it makes the probability of the MC move almost zero. In your case, with the kMC you will have for sure overlaps with solvent molecules when you move your solute so then you need to minimize again energy (OK, something that you do in your PAPRECA + LAMMPS workflow) to move the overlaped solvent molecules somewhere. That step could be also hard to converge if the liquid is dense (for example, water, which is quite dense, even more than its solid phase, you know). The test you show are interesting. Organic solvents as you considered are also very important. Since these systems are very different from anything alse considered in PAPRECA, my suggestion is to add these somewhere as examples of runs with solvent or something like that. No need for extensive documentation (not at the level of the examples 1,2 and 3), I think it can be provided just as a proof of concept that the methods can be used in the presence of solvent. For people working with solvents it will be of interest to see that.

sntioudis commented 5 months ago

Thank you for your input and I certainly agree with all of the points made. To the best of my understanding, the performance of the PAPRECA simulation would really depend on the system. What worked ok with the TCP+toluene example might not perform so well in a water based system (potentially due to the elevated cost of energy minimization as you correctly mention).

my suggestion is to add these somewhere as examples of runs with solvent or something like that. No need for extensive documentation (not at the level of the examples 1,2 and 3), I think it can be provided just as a proof of concept that the methods can be used in the presence of solvent. For people working with solvents it will be of interest to see that.

Fair point. I will include the examples to the repository.

sntioudis commented 5 months ago

@jfaraudo I included a solvents example in the documentation and uploaded the relevant files in the repository.

I will be addressing the rest of the paper issues in a bit!

sntioudis commented 5 months ago

@jfaraudo

This code implements hybrid kMC/MD simulations by combining the new kMC code with the LAMMPS MD engine. The concepts reminds me a bit that of py-MCMD (code : https://github.com/GOMC-WSU/py-MCMD paper: J. Chem. Theory Comput. 2022, 18, 8, 4983–4994 , https://pubs.acs.org/doi/10.1021/acs.jctc.1c00911 ) . In this previous code the authors combine MC with MD using the NAMD engine (instead of LAMMPS). An important difference is that in py_MCMD the MC method is a Grand Canonical MC , aimed at adding/removing molecules and atoms from the system following Grand Canonical ensemble. Here the method is kMC , aimed at the speed up of kinetic processes that are too slow in a direct MD simulation. I think this previous method has to be mentioned (and very briefly compared with the present approach) in the paper.

I have another comment about the JOSS draft paper. After reading the submitted paper, it was not obvious to me why kMC and MD were combined in the way you did here. After reading your previous paper in Computational Materials Science, vol. 229, 2023, that was obvious to me. You mention clearly in the pevious paper that the mission of the MD is only to take into account high frequency / short time effects such as molecular vibration, small changes in bond distances and molecular flexibility and thinks like that, whereas the main kinetics of the effects to be studied were considered using kMC from "frozen" MD configurations. I think that a short explanation in this line, like the one in your paper (for ex. in page 3 below the flow chart), will be a great addition to the Statement of need section of the JOSS paper.

I believe both issues above are now addressed in the revised version of the paper:

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

jfaraudo commented 5 months ago

Yes, thank you!