Open Nokimann opened 3 years ago
Hi,
The QM9 dataset is adapted from GDrive link from Faber et al.. They provide the mean/std in qm9-prop-stats-v1
file and the normalized dataset in qm9-mol-info-standardized-v1
file.
The units can be found in Faber et al. (Table 3 and 4), or Choudhary et al. (Table 5).
Thank you @knc6 We can't directly load the mean/std from JARVIS now?
I don't think it's a good idea to provide only standardized data, as it invites the same evaluation error as in ALIGNN. I've observed this confusion between scaled and original data (and inner energy vs. atomization energy) on QM9 in multiple previous papers as well.
It would be great if you would instead provide the data in real units, as done e.g. by PyG: https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html#torch_geometric.datasets.QM9
I used the following code:
The 1st data in QM9 dataset obtained from JARVIS:
And, the original 1st data in QM9 dataset with description:
I found the units are converted and normalized For example, for homo, lumo, ... Hartree -> eV, and then normalized from the entire data with mean and std
How could I get a unit and mean/std factors for each property?