QM9 dataset only has 12 target series

pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch

https://pyg.org

MIT License

21.31k stars 3.65k forks source link

QM9 dataset only has 12 target series #82

Closed sunfanyunn closed 5 years ago

sunfanyunn commented 5 years ago

QM9 dataset should have 13 regression targets but using the examples qm9_nn_conv.py shows that there are only 12 target series

rusty1s commented 5 years ago

Yeah, the Omega target is missing, because it is not provided in the official dataset. I do not know where to get the last target or how to calculate it. Do you?

sunfanyunn commented 5 years ago

Maybe take a look here? https://github.com/priba/nmp_qc/blob/master/data/download.py

rusty1s commented 5 years ago

I.  Property  Unit         Description
--  --------  -----------  --------------
 1  tag       -            "gdb9"; string constant to ease extraction via grep
 2  index     -            Consecutive, 1-based integer identifier of molecule
 3  A         GHz          Rotational constant A
 4  B         GHz          Rotational constant B
 5  C         GHz          Rotational constant C
 6  mu        Debye        Dipole moment
 7  alpha     Bohr^3       Isotropic polarizability
 8  homo      Hartree      Energy of Highest occupied molecular orbital (HOMO)
 9  lumo      Hartree      Energy of Lowest occupied molecular orbital (LUMO)
10  gap       Hartree      Gap, difference between LUMO and HOMO
11  r2        Bohr^2       Electronic spatial extent
12  zpve      Hartree      Zero point vibrational energy
13  U0        Hartree      Internal energy at 0 K
14  U         Hartree      Internal energy at 298.15 K
15  H         Hartree      Enthalpy at 298.15 K
16  G         Hartree      Free energy at 298.15 K
17  Cv        cal/(mol K)  Heat capacity at 298.15 K

No omega :(

sunfanyunn commented 5 years ago

Thanks for responding. I have two more questions:

is the target arranged in the same sequence as above (and most papers)? for example target = 1 corresponds to alpha?
Is the example qm9_nn_conv.py implementing the paper Neural Message Passing for Quantum Chemistry? Are you able to reprduce their results?

rusty1s commented 5 years ago

Yes, ordering is the same.
The qm9_nn_conv example tries to reimplement the Gilmer paper (as best as I could) which uses (a) a fully-connected input graph (b) the node features from Table 1 (c) the edge network from Section 5.1 (d) the Set2Set operator as the global aggregation scheme (e) and updates node embeddings with a GRU module Results are nearly identical.

sunfanyunn commented 5 years ago

Thanks for the quick respond but to be honest I wasn't able to get similar results as their paper. For example, using 0 as target series, we should be able to get TEST MAE 0.03 right (note that they report error ratio in their tables)? But directly running the example does not seem to get results any where close

Please let me know if I understand something wrong

rusty1s commented 5 years ago

I will look into it.

rusty1s commented 5 years ago

Yes, there was a small bug due to changes in the API of NNConv. Currently getting to 0.08 TEST MAE after 100 epochs (target 0).

Authors report MAE after 540 epochs (with possibly different hyperparameters):

T was constrained to be in the range 3 ≤ T ≤ 8. The number of set2set computations M was chosen from the range 1 ≤ M ≤ 12. All models were trained using SGD with the ADAM optimizer, with batch size 20 for 3 million steps (540 epochs).

sunfanyunn commented 5 years ago

Thanks a lot!

Laksh1997 commented 4 years ago

Just to quickly comment here:

As @rusty1s mentioned, different hyperparameters were used.
Hydrogens were added (not sure if that is the case here); this was reported to increase performance
Towers were used (basically a multiheaded MPNN); this increased generalization performance