noegroup / EDMnets

parameterizing valid Euclidean distance matrices (EDMs) via neural networks
MIT License
19 stars 3 forks source link

Would be great to see this packaged on pip with some simpler usage / API instructions #2

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago

assuming I'm understanding the functionality of the repository correctly, i.e. something like:

pip install edmnets

(see https://packaging.python.org/en/latest/tutorials/packaging-projects/)

from edmnets import EDM
edm = EDM()
edm.fit(train_coordinate_sets)
test_coordinates_pred = edm.predict(test_distance_matrix)

Related to https://github.com/sparks-baird/xtal2png/issues/83 and https://github.com/qzhu2017/PyXtal/issues/199

clonker commented 2 years ago

Am I understanding you correctly in that you want to obtain coordinates from a distance matrix (as in eg MDS)? And do you also want to match the obtained coordinates to a reference set? In any case I can publish the code on pip, that is not a problem.

sgbaird commented 2 years ago

@clonker thanks for the reply!

Am I understanding you correctly in that you want to obtain coordinates from a distance matrix (as in eg MDS)? And do you also want to match the obtained coordinates to a reference set?

That's correct on both accounts. In the context of this repo, I was only thinking about the first one, but I'd appreciate any insights you might have about the second one!

obtain coordinates from a distance matrix (as in eg MDS)

I.e. train edmnet to reconstruct coordinates from distance matrices of ~100k crystal structures, plus probably some data augmentation where noise is added to the distance matrices for robustness. The idea is to get it to reconstruct "realistic" Euclidean coordinates for atoms in a crystal structure. MDS might be a bit too naive for this when the distance matrices are "weird" (since they'll usually be coming from a generative model that's not explicitly aware of the Euclidean nature of the matrix).

match the obtained coordinates to a reference set

The second part of matching/aligning coordinates to a reference set (and trying to homogenize them in some way, e.g. by taking the average) is another outstanding issue for me with xtal2png, and so if you have any advice in that regard, that would be great.

In any case I can publish the code on pip, that is not a problem.

Great, thank you!

clonker commented 2 years ago

I'll clean up my code a bit and push it here. Ported the things to pytorch in the meanwhile. When assigning atoms to one another based on distance matrices you can formulate it as a linear sum assignment problem (can be solved with eg the Hungarian algorithm, as here). Right now and in this repository the Hungarian algorithm is implemented in C++ but it is also available in scipy. I will also upload that version. Once you have the assignment you can simply align the two structures with something like the Kabsch algorithm. It is implemented for example here in MDTraj.