Chemical accuracy values

JessicaSchrouff commented 5 years ago

Hello, I am trying to reproduce results from the Gilmer et al., 2017 paper and have noticed that they mention normalizing of the targets. However, the chemical accuracies used do not seem to be normalized in their Supplementary Table. In your code, those values are different from Faber et al., 2017 and Gilmer et al., 2017. Would you mind explaining how you derived them? Also, does your validation split reproduce the one mentioned in Gilmer et al., 2017, or was it random? Thank you! Best,

51alg commented 5 years ago

Hi,

We normalize the regression properties $y$ by subtracting the dataset mean $\mu_y$ and dividing by the dataset standard deviation $\sigma_y$.

$\hat{y}=\frac{y-\mu_y}{\sigma_y}$

The error ratio is defined as the ratio of the absolute error in predicting the true property, $y^*$ to the chemical accuracy $a$.

$\epsilon = \frac{|y-y^*|}{a}$

We can write this in terms of normalized quantities:

$\epsilon = \frac{|(\sigma_y\hat{y}+\mu_y) - (\sigma_y\hat{y}^*+\mu_y)|}{a}=\frac{|\hat{y}-\hat{y}^*|}{a/\sigma_y}$

The values of the "normalized chemical accuracy" $a/\sigma_y$ are the numbers seen in the code.

In terms of reproducing Gilmer et al. - note that we do not calculate all the node features used in that paper (e.g. acceptor/donor/hybridization etc). You can try to find the functions for calculating these in rdkit. The validation split that we use is one of several provided to us by Gilmer et al, however we cannot guarantee that this was the split that they actually used in their paper.

JessicaSchrouff commented 5 years ago

Thank you very much for the reply! This makes sense.

Note: I am using the different chemical properties (except for partial charges) but have found them to make little improvement.

Thank you for your time!

microsoft / gated-graph-neural-network-samples

Chemical accuracy values #19