nlesc-nano / auto-FOX

A library for analyzing potential energy surfaces (PESs) and using the resulting PES descriptors for constructing forcefield parameters.
GNU Lesser General Public License v3.0
9 stars 8 forks source link

ENH: Add a recipe for calculating the similarity between 2 MD trajectories #206

Closed BvB93 closed 3 years ago

BvB93 commented 3 years ago

Pinging @lfeld1.

The recipe should:

Examples

A mockup example:

from typing import Any, Callable, Union
import numpy as np

Callback = Callable[[np.ndarray, np.ndarray], np.ndarray]

def compare_trajectories(
    md: np.ndarray, md_ref: np.ndarray, metric: Union[str, Callback] = "cosine", **kwargs: Any
) -> np.ndarray:
    """Compare and return the similarity between 2 MD trajectories using the specified **metric**."""
    md_ar = np.asarray(md, dtype=np.float64)
    md_ref_ar = np.asarray(md_ref, dtype=np.float64)

    if metric == "cosine":
        ret: np.ndarray = ...  # Insert cosine distance calculation function
    else:
        ret = metric(md_ar, md_ref_ar, **kwargs)
    return ret
lfeld1 commented 3 years ago

There are two related problems:

  1. compare two trajectories, as you have written above: Comparing two trajectories could be based on averaged properties such as RDF, ADF, RMS-displacement, Phonon spectrum or the average of some descriptor.
  2. compare two frames: This requires descriptors of the structure, e.g. SOAP descriptors, Coulomb matrix or radial histograms. Atomic descriptors (e.g. SOAP) allow comparing atomic environments between different atoms within one structure or the same atom in different frames (e.g. change of atom's environement along trajectory).

For both, one can use (dis-)similarity metrics (euclidian, cosine). However, for the comparison of trajectories, it might be enough to plot and comare RDFs, ADFs and phonon spectra (unless you compare many different trajectories).

To compare two structures, one has to:

This could then be used to compare a whole trajectory to a reference structure, the change in consecutive frames or create a distance matrix for a trajectory. Optional: carry out dimensionality reduction (e.g. Kernel PCA) based on such distance kernels and descriptors on a dataset (e.g. https://pubs.acs.org/doi/full/10.1021/acs.accounts.0c00403). Dimensionality reductions based on various distance metrics are available in scipy (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition & https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold)

For the descriptors:

import numpy as np

def generate_descriptor(frames, desc: dict={} ):
    """create the descriptors (type and parameters in desc) for the trajectory in frames
    (basically has the information of an xyz file)"""

    try:  desc_type = dict["type"]
    except:
        raise ValueError("descriptor type not defined")

    if desc_type=="SOAP":
        from dscribe.descriptors import SOAP   
        """documentation of dscribe SOAP function: https://singroup.github.io/dscribe/latest/tutorials/soap.html
        problem with soap from dscribe: is trajectory is provide as ase.Atoms list -> MultMolecule??"""
        try: 
            # get get the parameter for required for SOAP descriptors
        except: 
             # raise error
        # create instance of SOAP funcition
        soap = SOAP( ... )
        ret = soap.create( frames )

    elif desc_type=="radial_hist":
        try: 
            # get get the parameter for required for histogram descriptors
            # e.g. broadening, cutoff, elements/pairs of elements
        except: 
             # raise error
        ret = generate_radial_histogram(frames, parameters)

    elif ....

    return ret

For the similarity


from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import pdist
import numpy as np

def get_similarity(desc1: np.array, desc2: np.array, metric: str ):
    """calculate the (dis-)similarity between two structures with the metric
        desc1 and desc2: 1xM dim arrays of global descriptors (M is the size of the descriptors)
         in principle NxM dim arrays of atomic descriptors should be possible as well, but 
         this can become memory demanding (N is the number of atoms in the frame)"""

    ret = pairwise_distance(desc1, desc2, metric)        # calculate distance
    return ret

def get_distance_matrix(desc_traj: np.ndarray, metric: str):
    """calculate the distance matrix of a from the descripotrs along a trajectory
    desc_traj: n x M dim matrix (n=number of frames, M=dimensioanlity of global descriptor)"""

   ret = pdist(desc_traj, metric=metric)   # calculate distance matrix
   return ret

def dim_reduction(desc_traj: np.ndarray, method: str, metric: str, n_CV: int)
    """carry out dimensionality reduction with the descriptors along a trajectory 
    with with method (e.g. KPCA, ICA, ISOMAP, ...) where the distance is defined by metric
    n_CV: number of collective variables after dimensionlity reduction"""

   if method=='KPCA':
       # do KPCA
  ....