theochem / procrustes

Python library for finding the optimal transformation(s) that makes two matrices as close as possible to each other.
https://procrustes.qcdevs.org/
GNU General Public License v3.0
109 stars 20 forks source link

Add support of 1D, 2D and 3D molecular structures #167

Open FanwangM opened 2 years ago

FanwangM commented 2 years ago

This is a long-term goal and it's mentioned in https://github.com/theochem/DiverseSelector/issues/55.

Do you mean having 1D (SMILES), 2D or 3D structures as input for Procrustes, and then Procrustes would compute the similarities between the molecules? I am thinking of having such functionality as a tools module. What do you think? @PaulWAyers

PaulWAyers commented 2 years ago

I think having a tool for 3D structure comparison is sensible. When the number/type of atoms are not the same, however, it is more complicated (but could be done from graph similarity (2D) or fingerprint similarity (1D)). But I think as a Procrustes tool, only 3D similarity and (maybe) 2D (adjacency matrix) similarity make sense.

It may be that some of the tools are better added as utilities for DiverseSelector.

FarnazH commented 2 years ago

We have examples among the existing notebooks for comparing 3D (using Cartesian coordinates loaded by IOData) and 2D structures (based on user-specified adjacency matrices; this can also be provided from RDKit, and we can update the example to show that) using procrustes functionality; see https://github.com/theochem/procrustes/tree/master/doc/notebooks. @PaulWAyers and @fwmeng88, can you please clarify the scope of this issue?

As we discussed within the context of DiverseSelector, it's best not to add these utilities/wrappers to our various packages. It's best to demonstrate how our packages can be used in conjunction with external libraries (e.g., IOData and RDKit) which are meant to generate structures, fingerprints, etc.

PaulWAyers commented 2 years ago

I'm thinking about a utility function; it may be that it belongs in IOData or somewhere else. But we need to be able to compare molecules (in bulk, not via a notebook). In this case, I'm not sure that there is need for IOData or RDKit; one reads in a list of atomic numbers and their positions and processes directly. Probably we should discuss this now.

For 1D or 2D I don't see the need, right now, for stand-alone Python functions: Notebook-style examples would suffice I think.