Issue with Stress training

When I attempt to train a m3gnet model, I get an output to my slurm file that looks like:

Epoch 87: 71%|███████▏ | 25/35 [00:05<00:02, 4.56it/s, v_num=0, val_Total_Loss=1.410, val_Energy_MAE=0.385, val_Force_MAE=0.298, val_Stress_MAE=0.000, val_Magmom_MAE=0.000, val_Energy_RMSE=0.477, val_Force_RMSE=0.929, val_Stress_RMSE=0.000, val_Magmom_RMSE=0.000, train_Total_Loss=0.259, train_Energy_MAE=0.328, train_Force_MAE=0.113, train_Stress_MAE=0.000, train_Magmom_MAE=0.000, train_Energy_RMSE=0.422,

This indicates it is optimizing energy and forces, but not stresses. How do I fix this? Assuming it is not intended.

My submit script is attached -- it is heavily influenced by the example training files provided. training_m3gnet_potential_with_pytorch_lightning.py.zip

A single entry in my training file looks like: {"structure": {"@module": "pymatgen.core.structure", "@class": "Structure", "charge": 0, "lattice": {"matrix": [[2.69921351, 0.0, 0.0], [0.10334238, 2.68191814, 0.0], [-0.21116918, 0.20981173, 5.43790006]], "pbc": [true, true, true], "a": 2.69921351, "b": 2.6839084479849764, "c": 5.446041722863998, "alpha": 87.8793442745315, "beta": 92.22218948176963, "gamma": 87.7933128488942, "volume": 39.36533742656342}, "properties": {}, "sites": [{"species": [{"element": "Nb", "occu": 1}], "abc": [0.00241095, 0.0013711, 0.50067853], "xyz": [-0.0990785130745529, 0.10872540651491089, 2.7226398083277115], "properties": {}, "label": "Nb"}, {"species": [{"element": "Br", "occu": 1}], "abc": [0.00234791, 0.99829116, 0.99897609], "xyz": [-0.10144966696528132, 2.886932072677178, 5.4323321397495645], "properties": {}, "label": "Br"}]}, "frame_properties": {"e_fr_energy": -36.85148612, "e_wo_entrp": -36.85067928, "e_0_energy": -36.8510827, "forces": [[-0.00269087, -0.01388798, 0.00205097], [0.00269087, 0.01388798, -0.00205097]], "stresses": [[-31.75161844, 13.6118264, 9.71897911], [13.61183087, -11.30419982, -10.60078758], [9.71907457, -10.60089905, -202.29550347]], "magmoms": [[-0.008], [-0.0]]}}

so when stresses are read, I believe that it should match the format in the example training files.

This is all done using dgl==2.2a240410+cu121, matgl==1.1.1 pytorch-lightning==2.2.1

Any advice is appreciated on how to fix this stress issue. Perhaps @kenko911 or @JiQi535 have experienced this?

materialsvirtuallab / matgl

Issue with Stress training #280