sparks-baird / xtal2png

Encode/decode a crystal structure to/from a grayscale PNG image for direct use with image-based machine learning models such as Palette.
https://xtal2png.readthedocs.io/
MIT License
34 stars 3 forks source link

What is our notion of best-fit for generation, prediction, and relaxation? #12

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago

EDIT: see also issues with the "notion of best" label

Relaxation is probably the most straightforward - use some crystal distance. Prediction can be about checking against known allotropes, where we take the lowest crystal distance among the allotropes. Generation is the least straightforward. Perhaps a Pareto hypervolume metric via a fictitious adaptive design campaign (e.g. bulk modulus vs. energy above hull)? Perform hyperparameter optimization and then do DFT as the final validation.

sgbaird commented 2 years ago

Another option that struck me is using a time-split. For example:

  1. Split Materials Project into two pieces based on a datetime split
  2. unconditionally generate many crystal structures
    1. 10+ million? Maybe check convergence with number of generated structures (a hyperparameter of the metric)
  3. check fraction of how many close matches with latter half of Materials Project entries to total number of latter half, with higher fraction--> better performance
    1. match tolerance(s) will be other hyperparameter(s) for the metric
sgbaird commented 2 years ago

Also can take a look at the model accuracy for Matbench task(s) as a way to probe the "quality" of the xtal2png representation from another perspective #50

sgbaird commented 2 years ago

DFT simulations will also be important as a high-cost validation.

sgbaird commented 2 years ago

From mp-time-split:

... MPTS-52 can be used with the metrics introduced in CDVAE's compute_metrics.py script (see https://github.com/txie-93/cdvae/issues/10. ...

sgbaird commented 2 years ago

Having trouble getting CDVAE to run https://github.com/txie-93/cdvae/issues/19, but can probably splice out the compute_metrics.py while that's getting sorted out.

sgbaird commented 2 years ago

compute_metrics.py seems to be tightly integrated with the rest of the codebase. Simplest solution might just be to fork CDVAE, make it pip- and conda-installable, and then include it as a dependency for matbench-genmetrics.

sgbaird commented 2 years ago

Might hold off on CDVAE metrics for now. See https://github.com/txie-93/cdvae/issues/10

sgbaird commented 2 years ago

As an update, matbench-genmetrics runs in a reasonable time now https://github.com/sparks-baird/matbench-genmetrics/blob/main/notebooks/1.0-matbench-genmetrics-basic.ipynb