materialsvirtuallab / megnet

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
BSD 3-Clause "New" or "Revised" License
502 stars 156 forks source link

Train the model for customised train-test split #281

Open kdmsit opened 3 years ago

kdmsit commented 3 years ago

I have around 40K crystal data from the materials project database in .cif file format. I want to train the megnet model from scratch using my own train test split (e.g train 20% test 80%) for formation energy and bandgap property. Could you please help me, how to do that?

chc273 commented 3 years ago

@kdmsit can you be more specific?

Please see the example notebooks for how to use the models. Also the megnet model predicts intensive properties so for extensive properties you will need to convert it to a per-atom quantity

kdmsit commented 3 years ago

I am using the fo0llowing code snippet for it:

from pymatgen.core.structure import Structure
nfeat_bond = 100
epoch=1000
r_cutoff = 5
gaussian_centers = np.linspace(0, r_cutoff + 1, nfeat_bond)
gaussian_width = 0.5
graph_converter = CrystalGraph(cutoff=r_cutoff)
model = MEGNetModel(graph_converter=graph_converter, centers=gaussian_centers, width=gaussian_width)
graphs_valid = []
targets_valid = []
structures_invalid = []
for i in idx_train:
    crystal=Structure.from_file(os.path.join(data_path, str(i) + '.cif'))
    p=float(id_prop_data[i][index])
    try:
        graph = graph_converter.convert(crystal)
        graphs_valid.append(graph)
        targets_valid.append(p)
    except:
        structures_invalid.append(crystal)
print("Train Data Load Done......")

print("Training the model......")
model.train_from_graphs(graphs_valid, targets_valid,epochs=epoch)

for i in idx_test:
    try:
        new_structure = Structure.from_file(os.path.join(data_path, str(i) + '.cif'))
        pred_target = model.predict_structure(new_structure)
        true_target = float(id_prop_data[i][index])
        ae = abs(float(pred_target[0])-true_target)`

But I am not able to acheive good results. Could you please help me to understand whether I am doing the training in correct way or not.

chc273 commented 3 years ago

@kdmsit I don't see an issue in the code. In general, you need to check whether the target properties are intensive and whether or not they can be predicted from the structure. Please provide more details if you still cannot find the solutions.

chc273 commented 3 years ago

If it is only MP structures, formation energy and band gap, those should be fairly easy to train. https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/crystal_example.ipynb Check this for example.