synthetic dataset - Githubissues

balcilar commented 4 years ago

Thanks for the paper and the code. However, I am more interested in the synthetic dataset that you prepared. Indeed it is a great way of evaluating GNN and understand their limits. Do you have any plans to share it?

leichen2018 commented 4 years ago

Yes. The current uploaded version is still incomplete. Will try to finish it in days.

Thanks for your attention and patience.

leichen2018 commented 4 years ago

Hi Muhammet, I've uploaded synthetic datasets. Please let me know if there is any question. Hope this is helpful for your research!

balcilar commented 3 years ago

thanks a lot. I appreciate.

balcilar commented 3 years ago

seems the variance of the syntetic dataset's label is not true. I just checked Regular graph's chordal_cycle's variance is 11.4, and tailed_triangle is 163.6585, wich are reported 102 and 1472 in the paper as. Is it a typo. The result was calculated by true variance or wrongly reported one?

leichen2018 commented 3 years ago

I agree with you that the labels variances for tailed triangles and chordal cycles in our released synthetic datasets are not consistent with the reported stats in our recent arvix version. As mentioned in https://github.com/leichen2018/GNN-Substructure-Counting/blob/2c213d6c3bb2860afab083e0e91971c9634487e1/synthetic/README.md, the reported ones are 9x larger than the released ones, and the constant factor 9 is precise.

The reason for such inconsistence is different implementations of generating labels for the same graphs when performing our synthetic experiments for (LRP-1-3, PPGN) and (Deep LRP-1-3). Taking 163.6585 vs 1472 as an example: for (LRP-1-3, PPGN), I divided MSE Loss of models trained with labels of variance 1472 by the variance 1472; for (Deep LRP-1-3), I divided MSE Loss of models trained with labels of variance 163.6585 by the variance 1472 (wrong!). Hence, results for (Deep LRP-1-3) should be multiplied by 9.

To verify this claim, I've re-performed (Deep LRP-1-3) experiments for counting tailed triangles and chordal cycles on both datasets, with the following script, where seeds are chosen from {0,1,2,3,4}:

python3.7m main_synthetic.py --dataset ${DATASET} --task ${TASK} --lr 0.002 --seed ${SEED}

The results are:

	Erdos-Renyi	Erdos-Renyi	RRG	RRG
	top 1	top 3	top 1	top 3
tailed triangle (reported)	3.00E-06	1.25E-05	1.37E-07	2.25E-05
chordal cycle (reported)	8.03E-06	9.65E-05	7.54E-13	3.22E-07
tailed triangle (reported) x9	2.70E-05	1.13E-04	1.23E-06	2.03E-04
chordal cycle (reported) x9	7.23E-05	8.69E-04	6.79E-12	2.90E-06
tailed triangle (new)	3.96E-05	1.35E-04	1.60E-05	2.02E-04
chordal cycle (new)	6.50E-05	8.96E-04	3.83E-09	3.99E-06

We can see that new results are much closer to reported ones multiplied by 9. The remaining inconsistence is mainly due to randomness in model training. Perhaps, a density plot of losses is a better idea to show such results, as in Figure 4, 5, 15 of https://arxiv.org/pdf/2009.11848.pdf .

Apologies for such inconsistence! We will update results with new ones shortly after the conference.

Hope this helps answer your questions.

balcilar commented 3 years ago

thanks indeed. More than enough for me. I already run it and got almost the same result. I appreciate it!

leichen2018 / GNN-Substructure-Counting

synthetic dataset #2