Closed BvB93 closed 3 years ago
There are two related problems:
For both, one can use (dis-)similarity metrics (euclidian, cosine). However, for the comparison of trajectories, it might be enough to plot and comare RDFs, ADFs and phonon spectra (unless you compare many different trajectories).
To compare two structures, one has to:
This could then be used to compare a whole trajectory to a reference structure, the change in consecutive frames or create a distance matrix for a trajectory. Optional: carry out dimensionality reduction (e.g. Kernel PCA) based on such distance kernels and descriptors on a dataset (e.g. https://pubs.acs.org/doi/full/10.1021/acs.accounts.0c00403). Dimensionality reductions based on various distance metrics are available in scipy (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition & https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold)
For the descriptors:
import numpy as np
def generate_descriptor(frames, desc: dict={} ):
"""create the descriptors (type and parameters in desc) for the trajectory in frames
(basically has the information of an xyz file)"""
try: desc_type = dict["type"]
except:
raise ValueError("descriptor type not defined")
if desc_type=="SOAP":
from dscribe.descriptors import SOAP
"""documentation of dscribe SOAP function: https://singroup.github.io/dscribe/latest/tutorials/soap.html
problem with soap from dscribe: is trajectory is provide as ase.Atoms list -> MultMolecule??"""
try:
# get get the parameter for required for SOAP descriptors
except:
# raise error
# create instance of SOAP funcition
soap = SOAP( ... )
ret = soap.create( frames )
elif desc_type=="radial_hist":
try:
# get get the parameter for required for histogram descriptors
# e.g. broadening, cutoff, elements/pairs of elements
except:
# raise error
ret = generate_radial_histogram(frames, parameters)
elif ....
return ret
For the similarity
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import pdist
import numpy as np
def get_similarity(desc1: np.array, desc2: np.array, metric: str ):
"""calculate the (dis-)similarity between two structures with the metric
desc1 and desc2: 1xM dim arrays of global descriptors (M is the size of the descriptors)
in principle NxM dim arrays of atomic descriptors should be possible as well, but
this can become memory demanding (N is the number of atoms in the frame)"""
ret = pairwise_distance(desc1, desc2, metric) # calculate distance
return ret
def get_distance_matrix(desc_traj: np.ndarray, metric: str):
"""calculate the distance matrix of a from the descripotrs along a trajectory
desc_traj: n x M dim matrix (n=number of frames, M=dimensioanlity of global descriptor)"""
ret = pdist(desc_traj, metric=metric) # calculate distance matrix
return ret
def dim_reduction(desc_traj: np.ndarray, method: str, metric: str, n_CV: int)
"""carry out dimensionality reduction with the descriptors along a trajectory
with with method (e.g. KPCA, ICA, ISOMAP, ...) where the distance is defined by metric
n_CV: number of collective variables after dimensionlity reduction"""
if method=='KPCA':
# do KPCA
....
Pinging @lfeld1.
The recipe should:
Examples
A mockup example: