lrcfmd / ElMD

The Element Movers Distance for chemical composition similarity
GNU General Public License v3.0
31 stars 9 forks source link

Interesting follow-up work to ElMD within GRID #27

Closed sgbaird closed 8 months ago

sgbaird commented 2 years ago

From "Predicting bulk modulus" section of https://doi.org/10.26434/chemrxiv-2022-9m4jh

To address the absence of compositional information, we have therefore combined our GRID EMD with a similar compositional EMD, in a modified version of that demonstrated by Hargreaves et al.20 Our method represents the normalised elemental fractions as a 78- element vector in atomic number order (considering elements up to Bi, but excluding noble gases); taking SrTiO3 as a representative example would give values of 0.6, 0.2 and 0.2 at the 7th, 19th and 34th elements in this vector, respectively. Rather than ordering this vector by Pettifor scale and computing EMD directly as in [20], we instead introduce a pairwise dissimilarity metric (Fig. S3) between elements based on the statistical likelihood of species occurring within the same crystal structure (see methods).20 The advantage of this approach is that while the Pettifor scale assumes a constant distance between adjacent species, the substitutional (dis)similarity approach gives a more chemically meaningful metric. For example, the lanthanide series (La – Yb) covers a range of 14 steps on the Pettifor scale, while the dissimilarity approach gives a range of 0.4. In contrast, Na and F are adjacent on the Pettifor scale (one step), but have a dissimilarity of 4.38.

Repo: https://github.com/CumbyLab/gridrdf

Curious to hear your thoughts.

sgbaird commented 2 years ago

See also "Dissimilarity measure of local structure in inorganic crystals using Wasserstein distance to search for novel phosphors" https://doi.org/10.1080%2F14686996.2021.1899555

https://github.com/CumbyLab/gridrdf/issues/5

SurgeArrester commented 2 years ago

Hey Sterling,

I'm a big fan of the non-linear scale used here, Dr. Cumby actually presented this work to our team a few months back and I was thinking of adding it to ElMD codebase at the time, but then thesis writing took over 🥳

I think it would take an achievable bit of tweaking to the supporting distance functions to take in a 2D elementwise matrix and not trip anything up in the backend, it's mathematically sound. I recently came across this linear scale which I also quite like and was thinking of adding this to the set of dictionaries, so I'll probably do that feature at the same time.

Interestingly the approach they've taken with GRID is a really cool structural descriptor that was being simultaneously and independently investigated by a few members of our team in Liverpool, including ElMD developer @dwiddo. Check out their paper here and codebase here for their implementation.

I hadn't seen the phosphors paper before so cheers for sending that over! That's a really interesting approach, and I can see how you could get more "structural fidelity" by restricting the search to structural descriptors that are fundamentally similar, as it's hard to say what the signal to noise ratio is for highly dissimilar structures. I wonder if it could be combined with something like ChemEnv to get better input environments to search for against structural prototypes...