xiaoruiDong / RDMC

Reaction Data and Molecular Conformers (RDMC) is a package dealing with reactions, molecules, conformers, majorly in 3D.
https://xiaoruidong.github.io/RDMC/
MIT License
22 stars 1 forks source link

Stochastic conformer generation workflow #14

Closed PattanaikL closed 2 years ago

PattanaikL commented 2 years ago

This is the initial merge of the stochastic conformer generation pipeline. All files relevant to the core workflow are stored in the conformer_generation folder, which include initial guess embedders, optimizers, pruners, and metrics to detect when to stop the workflow. The power of the workflow relies on external packages. Importantly, we use xtb for optimization and crest for pruning, both of which can be installed through conda. For now, I've included a stripped version of GeoMol as an external package, which of course requires additional packages. If these packages aren't correctly installed, the conditional import statements should take care of this.

There are plenty of improvements we could make to this pipeline. Currently, the workflow keeps track of conformers using a dictionary structure to denote individual conformers, which seems redundant with the RDKitConf object. There are RDKitMol.Copy() littered throughout the code because I was worried about overwriting structures, some of which are probably unnecessary. There are a few extranneous for loops that could be removed for faster performance. All embedder/optimizer/pruner classes would benefit from a parent class for organization. crest.py has some writing functions we could consolidate within RDKitMol. Basically everything could be parallelized better.

I'd also like to add optional methods to track performance of this module (ex. timing individual steps, checking number of gradient calls, counting optimization failures, etc.) I'll probably work on adding these next since these are crucial to benchmarking different methods.

For now, the workflow should work well for users!

xiaoruiDong commented 2 years ago

Thank you @PattanaikL, for this awesome addition. I agree there are a few more things that can be improved, but since the code are very well written structure-wise, I will just merge it, and we can work on top of your implementation. I only have a minor question for adding almost the entire GeoMol to RDMC, is there any benefits over using a binary dependence (installed by pip or conda) of GeoMol? Or it's just because you haven't built one yet.

PattanaikL commented 2 years ago

I haven't built one yet. Do you think that's preferable to adding packages to external?