pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.94k stars 3.61k forks source link

Issue with Citeseer dataset #2018

Open milindkesar opened 3 years ago

milindkesar commented 3 years ago

🐛 Bug

Hi, I am facing trouble understanding the citeseer dataset from planetoid which is the one obtained using dataset = Planetoid(root='data/Planetoid', name='Citeseer') This dataset seems to have 4552 edges (undirected) and 3327 However the paper “Revisiting Semi-Supervised Learning with Graph Embeddings” from which the dataset is taken appears to have 4732 edges and 3327 nodes. Screenshot from 2021-01-15 18-01-11 Is there something I am missing out about the data representation? Can you please clarify this. Thank you.

rusty1s commented 3 years ago

Yes, you are right. There are some duplicated edges in the Planetoid datasets, which we remove in a pre-processing step. The same holds true for Cora.