sparks-baird / xtal2png

Encode/decode a crystal structure to/from a grayscale PNG image for direct use with image-based machine learning models such as Palette.
https://xtal2png.readthedocs.io/
MIT License
34 stars 3 forks source link

consider reconstructing fractional coordinates from distance matrix (via MDS or neural network) #83

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago

i.e. using multi-dimensional scaling (MDS), or could be a trained network. If a trained network, an interesting approach might be to map the representation (including redundant information) to the directly used non-redundant inputs Structure(lattice, elements, coords).

If using MDS, the hyperparameter would probably be the weighted average of the direct fractional coordinates and the reconstructed fractional coordinates, so a scalar hyperparameter between 0 and 1.

See also:

Anand, N.; Eguchi, R.; Huang, P.-S. Fully Differentiable Full-Atom Protein Backbone Generation. 2019. https://openreview.net/forum?id=SJxnVL8YOV

which uses pairwise distance matrix reconstruction.

Good discussion and examples about reconstructing directly vs. using a neural network in:

(1) Ovchinnikov, S.; Huang, P.-S. Structure-Based Protein Design with Deep Learning. Current Opinion in Chemical Biology 2021, 65, 136–144. https://doi.org/10.1016/j.cbpa.2021.08.004.

kjappelbaum commented 2 years ago

Lilienfeld et al. also generated distance matrices using an ML model they used dgsol to go from distance matrix to coordinates. Here's the download https://www.mcs.anl.gov/~more/dgsol/. I didn't see any Python bindings though.

sgbaird commented 2 years ago

Interesting that it supports using sparse pairwise distance matrices. Looking at the citations to one of the early dgsol papers, I'm realizing how rich the literature is for this topic, but a bit disappointed by the sparsity (esp. in Python) on GitHub [1][2]. The idea of being able to constrain based on lower and upper bounds and uncertainties came up in the context of molecular reconstruction.

I guess there's a CMake version of dgsol. I'm vaguely familiar with building external software (e.g. C++ code) as a part of packaging conda packages. dgsol or similar might still be worth exploring.

EDIT: https://github.com/wjakob/nanobind_example could be helpful. Not sure.

sgbaird commented 2 years ago

Implementing this feature would also require having an origin and some kind of alignment/orientation relative to the unit cell. Hmm..

119

sgbaird commented 2 years ago

Alternative would be to use site-to-site x,y,z vectors as RGB encodings rather than a pairwise distance matrix, which is what @michaeldalverson is doing right now.

kjappelbaum commented 2 years ago

Implementing this feature would also require having an origin and some kind of alignment/orientation relative to the unit cell. Hmm..

Once you have the fractional coordinates and the coordinates from the pairwise distance matrix you can compute, e.g. using the Kabsch algorithm, the optimal rotation onto each other. However, not sure how what one should do if they do not match up: Average, take one (which one?), fail if the disagreement is too large ...

sgbaird commented 2 years ago

Came across another repo that might be of interest here: https://github.com/stevenygd/PointFlow