sparks-baird / palette-dft-relaxation-surrogate

Pallete image processing models (e.g. JPEG restoration, inpainting), adapted to act as a surrogate model for density functional theory (DFT) relaxation of crystal structures and structure prediction.
MIT License
0 stars 0 forks source link

Getting started tasks: functions, indices, Palette building blocks, comparisons, and validation #2

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago

@hasan-sayeed

Some things we'll need:

We'll also need to get familiar with the building blocks of the (unofficial) Palette repository. There aren't instructions on non-CLI usage via Python, so we'll need to do (hopefully a small amount) of digging to see which Python modules are actually being called. We'll also need to see to what extent we need to rework the modules to handle our custom representation rather than image files, and whether the consideration of spatial relationships with images are going to throw things off with non-spatially-correlated feature order in Michael's representation.

We'll also need/want comparisons with other implementations and a "notion of best" to see if what we do has any real value. The notions of best may be easier to define than for generative tasks because we can have the ground truth. In the case of the WMB dataset mentioned by @CompRhys in https://github.com/materialsproject/matbench/issues/104#issuecomment-1030739336, the notion of best will be based on how closely the positioning of atoms matches that of the relaxed structure, so probably via a crystallographic distance metric, for which there are certainly a number of options:

  1. Zhang, R.; Seth, S.; Cumby, J. Grouped Representation of Interatomic Distances as a Similarity Measure for Crystal Structures; preprint; Chemistry, 2022. https://doi.org/10.26434/chemrxiv-2022-9m4jh.
  2. Thomas, J. C.; Natarajan, A. R.; Van der Ven, A. Comparing Crystal Structures with Symmetry and Geometry. npj Comput Mater 2021, 7 (1), 164. https://doi.org/10.1038/s41524-021-00627-0.
  3. Veremyev, A.; Liyanage, L.; Fornari, M.; Boginski, V.; Curtarolo, S.; Butenko, S.; Buongiorno Nardelli, M. Networks of Materials: Construction and Structural Analysis. AIChE J 2021, 67 (3). https://doi.org/10.1002/aic.17051.
  4. Pan, H.; Ganose, A. M.; Horton, M.; Aykol, M.; Persson, K. A.; Zimmermann, N. E. R.; Jain, A. Benchmarking Coordination Number Prediction Algorithms on Inorganic Crystal Structures. Inorg. Chem. 2021, 60 (3), 1590–1603. https://doi.org/10.1021/acs.inorgchem.0c02996.
  5. Jang, J.; Gu, G. H.; Noh, J.; Kim, J.; Jung, Y. Structure-Based Synthesizability Prediction of Crystals Using Partially Supervised Learning. J. Am. Chem. Soc. 2020, 142 (44), 18836–18843. https://doi.org/10.1021/jacs.0c07384. (6) Ganose, A. M.; Jain, A. Robocrystallographer: Automated Crystal Structure Text Descriptions and Analysis. MRS Communications 2019, 9 (3), 874–881. https://doi.org/10.1557/mrc.2019.94.

@hasan-sayeed the REFs above are in mat-discover/distance-metrics/structure Zotero group folder.

In terms of validation, we can probably use mat3ra (since we don't have a VASP license) to perform relaxation of structures produced via a generative model, since the idea would be to use the code here as a processing step of outputs from generative models. It also opens up the question of creating a dataset of generated compounds and their DFT-relaxed counterparts using e.g. FTCP which @ZahraGhSh has been working on with the help of @SiyuIsaacParkerTian, and of course we can use our group's GAN as well.

CompRhys commented 2 years ago

In terms of validation, we can probably use mat3ra (since we don't have a VASP license) to perform relaxation of structures produced via a generative model, since the idea would be to use the code here as a processing step of outputs from generative models.

I am no longer at mat3ra but just for awareness they operate on a license passthrough so to use the maintained VASP compilations you need to show evidence of a VASP license.

sgbaird commented 2 years ago

In terms of validation, we can probably use mat3ra (since we don't have a VASP license) to perform relaxation of structures produced via a generative model, since the idea would be to use the code here as a processing step of outputs from generative models.

I am no longer at mat3ra but just for awareness they operate on a license passthrough so to use the maintained VASP compilations you need to show evidence of a VASP license.

Very useful info. Thank you @CompRhys! That was my misunderstanding. Will probably need to write funding for a VASP license into whatever next grant we submit.

CompRhys commented 2 years ago

Very useful info. Thank you @CompRhys! That was my misunderstanding. Will probably need to write funding for a VASP license into whatever next grant we submit.

afaik it's a one time payment and last forever tied to the PI and not that expensive compared to experimental fixed costs ~5k