Open sgbaird opened 2 years ago
The following tutorial uses grayscale MNIST dataset for classification and might be one of the easiest to adapt to mp_is_metal
(or to mp_e_form
via adjusting the final layer and the loss function)
@faris-k did a classification task on mp_is_metal
and is getting the files ready for a Matbench PR. See the notebook.
It does leave the question on my mind, why does the regression results are so poor (much worse than dummy), whereas the classification results are OK (a bit better than dummy).
A follow-up computational experiment (that I think we should leave on the back-burner until further notice) is using the classification model, but with bins for the classes (e.g. formation energy between 0 and 0.05). Implementing ordinal classification would be extra work, so first treat it as categorical. I'm putting this here more as a future reference sort of thing as things progress with xtal2png
.
It's also interesting in the sense that hyperparameter-tuned XGBoost did a pretty good job on the regression task (~4x better than the CNN regression), and this was with much less information. We'll see if the results still hold when we double-check that data leakage wasn't coming into play. #51 and specifically https://github.com/sparks-baird/xtal2png/issues/51#issuecomment-1178611188
The task is to use a CNN model for a Matbench submission on regressing formation energy using the xtal2png representation (as an image and/or as an array would be fine). This will help with knowing how "good" the xtal2png representation is from a model accuracy perspective, though I don't expect this to set new benchmarks necessarily.
This might look like using
skorch
with some type of pytorch CNN module (e.g. ResNetUNet, Net) and an MSE loss function. This skorch tutorial looks like it might help with loading images, though this SO answer is probably better for making the actual dataset to pass toskorch
.If regression is too much of a pain (CNNs aren't used as often for property regression in the image-processing domain), an easy fallback is to do the
mp_is_metal
binary classification task instead of thee_form
regression task.Related:
Maybe Faris interested in working on this given that he'll be doing some image processing