kjappelbaum / element-coder

Encode chemical elements numerically and decode numerical representations of elements.
https://element-coder.readthedocs.io/en/latest/?badge=latest
MIT License
9 stars 0 forks source link

add docs that provide background about encodings #6

Open kjappelbaum opened 2 years ago

kjappelbaum commented 2 years ago

would be nice to link to some papers/further resources

sgbaird commented 2 years ago

Element Mover's Distance

(1) Hargreaves, C. J.; Dyer, M. S.; Gaultois, M. W.; Kurlin, V. A.; Rosseinsky, M. J. The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions. Chem. Mater. 2020, 32 (24), 10610–10620. https://doi.org/10.1021/acs.chemmater.0c03381.

A snippet relevant to scalar elemental featurizers:

We could assign the atomic number as the vector index for each element, then take the difference between indices as a measure of elemental similarity, but this approach loses the natural clustering of chemical properties afforded by the periodic table. An ideal elemental indexing would perfectly capture the chemical trends observed in nature, but ordering the elements in such a manner is problematic. As well as the unclear resolution of how to handle the f-block elements, chemical trends moving down the periodic table tend to be the direct opposite of those moving across. This leads to some elements having greater substitutional feasibility to their diagonal neighbor than their immediate neighbor, making a simple placement of these difficult.

To solve this problem, Pettifor proposed a method of labeling the elemental scale in his seminal paper of 1984,10 drawn from extensive domain knowledge. These numeric labels may form the basis of a coordinate system allowing us to associate patterns in geometric and physiochemical properties, with extensions to this idea continuing to guide practitioners.11,12 This concept of labeling was further developed by analyzing the probability that an element can be substituted for another given the same structural framework on 20,500 compounds of the inorganic crystal structure database (ICSD) by Glawe et al.13 This probability matrix can be reordered to maximize the likelihood that local neighborhoods will contain elements with greater feasibility of stable substitutions, thus possessing inherent chemical similarities.14 We take the associated indices of this final ordering to give each element its modified Pettifor number.

In this report, we define a composition vector by taking the ratio of each element in a compound assigned to the index of its respective modified Pettifor number. By assuming the sample of the set of feasibly stable compounds (although we know this is not strictly the case15), we can see that these indices capture the truly physical similarities between elements from statistical analysis. Using the modified Pettifor scale gives resultant similarities between compounds which align with human judgement but may be substituted with any continuous elemental scale including less equally spaced distributions, for example, Pauling electronegativity

Another snippet:

Even when using simple atomic numbers as the elemental index, the EMD introduces a significant structure to the UMAP generated clusters, leading to clusters with nontrivial shapes, however without the purity of labels observed when using the modified Pettifor scale (Figure S1). Elemental scales such as Pettifor’s original Mendeleev number13 and alternate orderings of this scale33 result in plots with similar cluster shapes and purity to the modified Pettifor scale (Figures S2−S6). An alternative approach to the use of compositional vectors X and Y is the use of recently developed vectors of features which are derived from values of physicochemical properties of the elements present in the composition.34−36

Composition-based property prediction models

(1) Tian, S. I. P.; Walsh, A.; Ren, Z.; Li, Q.; Buonassisi, T. What Information Is Necessary and Sufficient to Predict Materials Properties Using Machine Learning? 18. (2) Falkowski, A. R.; Kauwe, S. K.; Sparks, T. D. Optimizing Fractional Compositions to Achieve Extraordinary Properties. Integr Mater Manuf Innov 2021, 10 (4), 689–695. https://doi.org/10.1007/s40192-021-00242-3. (3) Vasylenko, A.; Antypov, D.; Gusev, V.; Gaultois, M.; Dyer, M.; Rosseinsky, M. Element Selection for Functional Materials Discovery by Integrated Machine Learning of Atomic Contributions to Properties; preprint; In Review, 2022. https://doi.org/10.21203/rs.3.rs-1334648/v1. (4) Chen, C.; Ong, S. P. AtomSets as a Hierarchical Transfer Learning Framework for Small and Large Materials Datasets. npj Comput Mater 2021, 7 (1), 173. https://doi.org/10.1038/s41524-021-00639-w. (5) Jha, D.; Choudhary, K.; Tavazza, F.; Liao, W.; Choudhary, A.; Campbell, C.; Agrawal, A. Enhancing Materials Property Prediction by Leveraging Computational and Experimental Data Using Deep Transfer Learning. Nat Commun 2019, 10 (1), 5316. https://doi.org/10.1038/s41467-019-13297-w. (6) Jha, D.; Ward, L.; Paul, A.; Liao, W.; Choudhary, A.; Wolverton, C.; Agrawal, A. ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition. Sci Rep 2018, 8 (1), 17593. https://doi.org/10.1038/s41598-018-35934-y. (7) Meredig, B.; Agrawal, A.; Kirklin, S.; Saal, J. E.; Doak, J. W.; Thompson, A.; Zhang, K.; Choudhary, A.; Wolverton, C. Combinatorial Screening for New Materials in Unconstrained Composition Space with Machine Learning. Phys. Rev. B 2014, 89 (9), 094104. https://doi.org/10.1103/PhysRevB.89.094104. (8) Ward, L. A General-Purpose Machine Learning Framework for Predicting. npj Computational Materials 2016, 7. (9) Gupta, V.; Choudhary, K.; Tavazza, F.; Campbell, C.; Liao, W.; Choudhary, A.; Agrawal, A. Cross-Property Deep Transfer Learning Framework for Enhanced Predictive Analytics on Small Materials Data. Nat Commun 2021, 12 (1), 6595. https://doi.org/10.1038/s41467-021-26921-5. (10) Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3. (11) Goodall, R. E. A.; Lee, A. A. Predicting Materials Properties without Crystal Structure: Deep Representation Learning from Stoichiometry. Nat Commun 2020, 11 (1), 6280. https://doi.org/10.1038/s41467-020-19964-7. (12) Wang, A. Y.-T.; Kauwe, S. K.; Murdock, R. J.; Sparks, D. Compositionally-Restricted Attention-Based Network for Materials Property Predictions. npj Computational Materials 2021, 33. https://doi.org/10.1038/s41524-021-00545-1.