About what cSBM parameters to use

Xiuyu-Li commented 3 years ago

In the Appendix A.5 of the paper, it is stated that n=5000 and f=2000. However, create_cSBM_dataset.sh set n=800 and f=1000. Which set of parameters should I use?

Also, I was not able to find what average degree was used in the paper. Should I just set it to 5 as in create_cSBM_dataset.sh? Thanks.

Xiuyu-Li commented 3 years ago

If I used the parameters listed in the paper and set the average degree to 5, I got the following edge homophily table	\phi	-1	-0.75	-0.5	-0.25	0	0.25	0.5
H(G)	0.042	0.075	0.171	0.323	0.5	0.678	0.824

which is much more homophilic than the table in the paper. Can you tell me the exact parameters used for generating cSBM synthetic datasets?

jianhao2016 commented 3 years ago

Hi Xiuyu,

Thank you for your interest in our work. For cSBM datasets you should use what we have stated in the supplement, i.e. n = 5000 and f = 2000. As for average degree, the default of 5 should be fine. For the homophily table, can you elaborate a bit more on how you calculate the value, or pasted here the function you used? Also, what is the value of epsilon you used? It should be 3.25 instead of the default 0.1 which will be too small.

Xiuyu-Li commented 3 years ago

Hi Jianhao,

Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:

def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()

which should be consistent with how H(G) was defined.

jianhao2016 commented 3 years ago

Hi Jianhao,

Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.

Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

Xiuyu-Li commented 3 years ago

Hi Jianhao, Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.

Sure. It is just the torch_geometric function from torch_geometric.utils import remove_self_loops.

jianhao2016 commented 3 years ago

Hi Jianhao, Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes):
    edge_index = remove_self_loops(edge_idx)[0]
    hs = torch.zeros(num_nodes)
    degs = torch.bincount(edge_index[0,:]).float()
    matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
    hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
    return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for remove_self_loops as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.
Sure. It is just the torch_geometric function from torch_geometric.utils import remove_self_loops.

Hi Xiuyu,

Thank you so much for pointing out this issue! I have tested both your function and our previous function with cSBM dataset and other simple graph and it turns out you're correct about the homophily scores. There is a small bug in our code when computing the homophily scores (doing division with torch integers) which caused the numbers to be smaller. I have fixed it and got similar results as yours. We will update the values in our paper accordingly. Thanks again for letting us know!

jianhao2016 / GPRGNN

About what cSBM parameters to use #4