Open ShuaiWang97 opened 1 year ago
Thanks for your attention. I try to debug it!
Thank you for the prompt response!
Most structure of the code is from the HGNN node classification example like def train
, def infer
and if __name__ == "__main__"
. I also think the several co-citation datasets are same from the ones used in HyperGCN repo . I only changed the net
and data
variable.
Please let me know if I can provide more information. Thanks in advance!
As I can see, some nodes never appear in the edge list. I am not sure whether that is intended or not. For example, in CocitationPubmed
, the number of nodes is 19717, but there are only 3840 nodes in the edge list.
I think we need to reorder the vertices to consecutive integers.
data_list = [
(CoauthorshipCora, "cora_coauth"),
(CoauthorshipDBLP, "dblp_coauth"),
(CocitationCora, "cora_cocite"),
(CocitationCiteseer, "citeseer_cocite"),
(CocitationPubmed, "pubmed_cocite"),
]
for data_func, data_name in data_list:
nodes_in_edges = set()
for edge in data_func()["edge_list"]:
nodes_in_edges.update(edge)
print(data_name, len(nodes_in_edges), data_func()["num_vertices"])
cora_coauth 2388 2708
dblp_coauth 41302 41302
cora_cocite 1434 2708
citeseer_cocite 1458 3312
pubmed_cocite 3840 19717
Also, for CocitationPubmed
, the training set is strangely small, while the val and test sets are identical.
print(CocitationPubmed()["train_mask"].sum())
print(CocitationPubmed()["val_mask"].sum())
print(CocitationPubmed()["test_mask"].sum())
print(torch.all(CocitationPubmed()["val_mask"] == CocitationPubmed()["test_mask"]))
tensor(78)
tensor(19639)
tensor(19639)
tensor(True)
Discussed in https://github.com/iMoonLab/DeepHypergraph/discussions/23