isayevlab / Auto3D_pkg

Auto3D generates low-energy conformers from SMILES/SDF
MIT License
144 stars 32 forks source link

Dataset Generation #61

Closed xiaolinpan closed 9 months ago

xiaolinpan commented 9 months ago

Thank you for your open-source code of Auto3D, it's a useful tool for drug discovery.

I'm a new researcher for neural network potential development and have some questions regarding dataset generation. You described the workflow for the nonequilibrium conformation sampling by DFN2-xTB and energy calculation by B97-3c.

"We carried out the nonequilibrium conformation generation process using GFN2-XTB molecular dynamics at 400 K for 20 ps. The optimized structures were selected for energy calculations with the B97-3c composite scheme in ORCA."

Does the non-equilibrium conformation sampled from the simulation trajectory need to optimize geometry by B97-c before calculating the single-point energy? Or directly calculate the single point energy using non-equilibrium conformations sampled from trajectories?

If the geometric optimization is applied before the single-point energy calculation, do constraints need to be added during geometric optimization? For example, dihedral constraints.

Looking forward to your reply. Thank you very much.

LiuCMU commented 9 months ago

Hello, thanks for your interest!

The non-equilibrium conformation was directly used in B973c for SPE and forces calculation, without optimization. This is desired as the NNP needs to learn the energy and forces for non-equilibrium structures. Each NNP of the ANI series and AIMNet series uses different datasets, but the general dataset generation process is similar. The ANI-1 data as well as the general preparation process is available in this paper: https://www.nature.com/articles/sdata2017193

@isayev please correct me if my understanding is wong

xiaolinpan commented 9 months ago

Thank you very much!