txie-93 / cgcnn

Crystal graph convolutional neural networks for predicting material properties.
MIT License
630 stars 306 forks source link

why the cgcnn model cannot predict graphene and diamond? #27

Open Amadeus-System opened 3 years ago

Amadeus-System commented 3 years ago

I have trained cgcnn model by using material-project ids in "cgcnn/data/material-data/mp-ids-46744.csv" and I have tested the model to predict "energy_per_atom"(target value of materials in material-project) of the "graphene" and "diamond".

Both materials are made of one element "C(carbon)" but they have different structures. so the model must be able to distinguish the difference between them. but the result is not good. The model fails to predict similar value of target(energy_per_atom).

the following is that capture

capture

In the above image, the target values of graphene and diamond are -9.0904, -9.2203 respectively. but the prediction values are -1.7248, -1.7681 respectively. (normalized target values are both 0.7071)

so, I want to know the reason why the trained cgcnn model cannot predict the target value of graphene and diamond?

I expect your kind explanation. Thank you.

txie-93 commented 3 years ago

This is an interesting observation. Are you training your model to predict the total energy per atom? What is your overall MAE? It looks as if the errors for both diamond and graphene are very large. Are they the worst predicted materials?

One possibility is that "total energy" is not well-defined in DFT calculations. It depends on the functionals used for the calculation. So this number may not be transferable to some carbon. It is usually easier to compare "formation energy per atom".

Amadeus-System commented 3 years ago

The overall MAE was about 0.1 ~ 0.16. My model was trained using cgcnn/data/material-data/mp-ids-46744.csv.(training for 60%, validation for 20%, test for 20%).

Graphene and diamond (materialsproject id-66, 48 respectively) are not in the training dataset. I wanted to see if the model can predict the difference between them, but it does not work as what i expected.

so, I have a additional question for dataset. What are the materialsproject-ids in your "mp-ids-46744.csv" file? Are they selected number just randomly? or is it contains much more some kind of specific material structure? (such as Perovskite)

txie-93 commented 3 years ago

Why do you think that the model cannot predict the difference between graphene and diamond? The prediction values are different. (-1.7248, -1.7681) The problem seems to be more related to the large test error (~8 eV/atom v.s. MAE = 0.1-0.16). CGCNN can differentiate graphene and diamond given the right training data. In this paper, we used CGCNN to predict the energies of different boron structures. (https://aip.scitation.org/doi/abs/10.1063/1.5047803)

At the time when we wrote the paper, the materials project database is much smaller. The 46744 materials are all the materials at that time after removing some low quality data points.