Closed Xiuyu-Li closed 3 years ago
If I used the parameters listed in the paper and set the average degree to 5, I got the following edge homophily table | \phi | -1 | -0.75 | -0.5 | -0.25 | 0 | 0.25 | 0.5 |
---|---|---|---|---|---|---|---|---|
H(G) | 0.042 | 0.075 | 0.171 | 0.323 | 0.5 | 0.678 | 0.824 |
which is much more homophilic than the table in the paper. Can you tell me the exact parameters used for generating cSBM synthetic datasets?
Hi Xiuyu,
Thank you for your interest in our work. For cSBM datasets you should use what we have stated in the supplement, i.e. n = 5000 and f = 2000. As for average degree, the default of 5 should be fine. For the homophily table, can you elaborate a bit more on how you calculate the value, or pasted here the function you used? Also, what is the value of epsilon you used? It should be 3.25 instead of the default 0.1 which will be too small.
Hi Jianhao,
Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes):
edge_index = remove_self_loops(edge_idx)[0]
hs = torch.zeros(num_nodes)
degs = torch.bincount(edge_index[0,:]).float()
matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float()
hs = hs.scatter_add(0, edge_index[0,:], matches) / degs
return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Hi Jianhao,
Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes): edge_index = remove_self_loops(edge_idx)[0] hs = torch.zeros(num_nodes) degs = torch.bincount(edge_index[0,:]).float() matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float() hs = hs.scatter_add(0, edge_index[0,:], matches) / degs return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for remove_self_loops
as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.
Hi Jianhao, Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes): edge_index = remove_self_loops(edge_idx)[0] hs = torch.zeros(num_nodes) degs = torch.bincount(edge_index[0,:]).float() matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float() hs = hs.scatter_add(0, edge_index[0,:], matches) / degs return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for
remove_self_loops
as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.
Sure. It is just the torch_geometric
function from torch_geometric.utils import remove_self_loops
.
Hi Jianhao, Thank you for the quick reply. I used epsilon=3.25 in the experiments, and used the following code to generate the table:
def node_homophily(edge_idx, labels, num_nodes): edge_index = remove_self_loops(edge_idx)[0] hs = torch.zeros(num_nodes) degs = torch.bincount(edge_index[0,:]).float() matches = (labels[edge_index[0,:]] == labels[edge_index[1,:]]).float() hs = hs.scatter_add(0, edge_index[0,:], matches) / degs return hs[degs != 0].mean()
which should be consistent with how H(G) was defined.
Can you show the code for
remove_self_loops
as well? I am trying to regenerate the values with your function on our tested dataset and a newly generated one to see if there is any difference.Sure. It is just the
torch_geometric
functionfrom torch_geometric.utils import remove_self_loops
.
Hi Xiuyu,
Thank you so much for pointing out this issue! I have tested both your function and our previous function with cSBM dataset and other simple graph and it turns out you're correct about the homophily scores. There is a small bug in our code when computing the homophily scores (doing division with torch integers) which caused the numbers to be smaller. I have fixed it and got similar results as yours. We will update the values in our paper accordingly. Thanks again for letting us know!
In the Appendix A.5 of the paper, it is stated that n=5000 and f=2000. However,
create_cSBM_dataset.sh
set n=800 and f=1000. Which set of parameters should I use?Also, I was not able to find what average degree was used in the paper. Should I just set it to 5 as in
create_cSBM_dataset.sh
? Thanks.