Closed BowenD-UCB closed 1 year ago
Hmm interesting. The MEGNet implementation doesn't suffer from this problem. So it must be in the threebody computations. @kenko911 pls fix this asap.
I debugged a little. I can confirm that in the forward method, the node_feat and edge_feat are consistent. But the
node_vec = self.readout(g)
vec = torch.hstack([node_vec, state_feat]) if self.inc...```
are stochastic. So there must be some issue with the readout layer. MEGNet does not use the same readout.
@BowenD-UCB Thanks a lot for reporting the issue and I will figure out the problem and fix it ASAP.
It seems that M3GNet UP is not having this issue.
Because the UP is not intensive and so does not call the readout.
I have confirmed that the stochasticity resulted from the Set2Set in dgl.nn. For instance, I replaced the set2set call in Set2SetReadout by the following code:
for i in range(3):
set2set = Set2Set(in_feats, n_iters=self.num_steps, n_layers=self.num_layers)
print(g.ndata["node_feat"])
out_tensor = set2set(g, g.ndata["node_feat"])
print(f"out={out_tensor}")
The out tensor is different all three times, despite there being no change in initialization and the input.
I just pushed a fix. The trick is to init the Set2Set only once. Every time it is init, it introduces stochasticity.
However, it should be noted that multiple loads of the model will still lead to stochasticity. I am not sure how to solve that. I am reopening this issue for now.
@kenko911 I think you need to look into whether there is some random seed somewhere that needs to be fixed for dgl.nn for the Set2Set initialization. It is rather strange it is not deterministic.
Scratch what I wrote previously. The dgl.seed does not do anything. Right now, the stochasticity with single model loading is solved. But the stochasticity with multiple model loadings is not.
from pymatgen.core import Lattice, Structure
import matgl
import torch
torch.manual_seed(0)
import random
random.seed(0)
import dgl
dgl.seed(0)
# This is the structure obtained from the Materials Project.
struct = Structure.from_spacegroup("Pm-3m", Lattice.cubic(4.1437), ["Cs", "Cl"], [[0, 0, 0], [0.5, 0.5, 0.5]])
for i in range(10):
model = matgl.load_model("M3GNet-MP-2018.6.1-Eform")
eform = model.predict_structure(struct)
print(f"The predicted formation energy for CsCl is {float(eform.numpy()):.3f} eV/atom.")
still gives stochastic results. Note, the difference between @BowenD-UCB 's code and this one is just the model loading is now in the for loop, so a load is carried out each time. If the load is outside of the for loop, results are now deterministic within that session.
See this is a known issue with LSTM: https://github.com/pytorch/pytorch/issues/35661
Note that it seems that if you fix the torch.manual_seed, all stochasticity is gone.
This is the fix that I pushed. I am still not sure why this non-deterministic behavior occurs in m3gnet but not MEGNet. MEGnet seems to work fine without any manual seed set. It may be due to the number of layers in the set2set? M3Gnet uses 3 layers while MEGnet uses just 1.
@BowenD-UCB Just a note that Kenko pushed a final fix this morning. A new version has been released. The main issue was that the final layer was not in the init and so the weights were not saved. The new code fixes this problem. The only model that was affected was the M3GNet formation energy model. I also added an integration test to make sure it does not recur in future.
Hi Shyueping and Kenko
The dgl implentation of M3GNet gives stochastic predictions, while this was not observed with the original M3GNet implementation.
I'm worrying this might associated to some serious bug that needs attention to. Please have a look: