iMoonLab / DeepHypergraph

A pytorch library for graph and hypergraph computation.
https://deephypergraph.com/
Apache License 2.0
664 stars 64 forks source link

kindly need help to for several reproduction results #24

Open ShuaiWang97 opened 1 year ago

ShuaiWang97 commented 1 year ago

Discussed in https://github.com/iMoonLab/DeepHypergraph/discussions/23

Originally posted by **ShuaiWang97** December 5, 2022 To the community, Hope you had a great weekend. Thank you so much for building this package! I am quite interested in hypergraph and learned a lot from the tutorial and source code. I tried to use the method and dataset from the package to reproduce several results. The performance on co-authorship dataset seems good but the performance on cocitation datasets seems a bit low. I checked the implements several times but did not find any problem. Can anyone please help me a bit? The accuracy score on node classification of several cocitation datasets (`CocitationCora`,`CocitationCiteseer`,`CocitationPubmed`) by HGNN, HyperGCN, HGNN+ are followed and the code is attached. The way I change datasets and methods are just to change `data` and `net` variable. Any ideas are incredibly welcome. Thanks in advance. ![image](https://user-images.githubusercontent.com/54426844/205740684-d1c9300e-31da-4126-b37e-21f7ba58daef.png) ``` import time from copy import deepcopy import torch import torch.optim as optim import torch.nn.functional as F from dhg import Hypergraph,Graph from dhg.data import Cooking200, CoauthorshipCora,CocitationCora,CocitationCiteseer,CoauthorshipDBLP, CocitationPubmed,\ Citeseer,Cora,Pubmed from dhg.models import HGNN, HyperGCN, HGNNP from dhg.random import set_seed from dhg.metrics import HypergraphVertexClassificationEvaluator as Evaluator from data import data #from config import config def train(net, X, A, lbls, train_idx, optimizer, epoch): net.train() st = time.time() optimizer.zero_grad() # import the data["features"] X and Graph structure G outs = net(X, A) outs, lbls = outs[train_idx], lbls[train_idx] loss = F.cross_entropy(outs, lbls) #loss = F.nll_loss(outs, lbls) # decrease performance a lot loss.backward() optimizer.step() print(f"Epoch: {epoch}, Time: {time.time()-st:.5f}s, Loss: {loss.item():.5f}") return loss.item() @torch.no_grad() def infer(net, X, A, lbls, idx, test=False): net.eval() outs = net(X, A) outs, lbls = outs[idx], lbls[idx] if not test: res = evaluator.validate(lbls, outs) else: res = evaluator.test(lbls, outs) return res if __name__ == "__main__": set_seed(2021) #args = config.parse() device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") evaluator = Evaluator(["accuracy", "f1_score", {"f1_score": {"average": "micro"}}]) # Load dataset of CocitationCiteseer, CocitationCora, CocitationPubmed #data = CocitationCora() data = CocitationCiteseer() # Build the hypergraph dataloader X, lbl = data["features"], data["labels"] HG = Hypergraph(data["num_vertices"], data["edge_list"]) #net = HGNNP(data["dim_features"], 16, data["num_classes"], use_bn=False) net = HGNN(data["dim_features"], 16, data["num_classes"], use_bn=False) print("net is: ", net) optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.0005) train_mask = data["train_mask"] val_mask = data["val_mask"] test_mask = data["test_mask"] print(f"length of train is : {sum(train_mask)}, length of val is: {sum(val_mask)},length of test is: {sum(test_mask)}") X, lbl = X.to(device), lbl.to(device) HG = HG.to(device) net = net.to(device) best_state = None best_epoch, best_val = 0, 0 for epoch in range(200): # train train(net, X, HG, lbl, train_mask, optimizer, epoch) # validation if epoch % 10 == 0: with torch.no_grad(): val_res = infer(net, X, HG, lbl, val_mask) print("val acc is: ",infer(net, X, HG, lbl, val_mask,test=True)["accuracy"]) print("val_res is: ",val_res) if val_res > best_val: print(f"update best: {val_res:.5f}") best_epoch = epoch best_val = val_res best_state = deepcopy(net.state_dict()) print("\ntrain finished!") print(f"best val: {best_val:.5f}") # test print("test...") net.load_state_dict(best_state) res = infer(net, X, HG, lbl, test_mask, test=True) print(f"final result: epoch: {best_epoch}") print(res) ``` Best, Shuai
yifanfeng97 commented 1 year ago

Thanks for your attention. I try to debug it!

ShuaiWang97 commented 1 year ago

Thank you for the prompt response! Most structure of the code is from the HGNN node classification example like def train, def infer and if __name__ == "__main__". I also think the several co-citation datasets are same from the ones used in HyperGCN repo . I only changed the net and data variable. Please let me know if I can provide more information. Thanks in advance!

bokveizen commented 8 months ago

As I can see, some nodes never appear in the edge list. I am not sure whether that is intended or not. For example, in CocitationPubmed, the number of nodes is 19717, but there are only 3840 nodes in the edge list.

I think we need to reorder the vertices to consecutive integers.

data_list = [
    (CoauthorshipCora, "cora_coauth"),
    (CoauthorshipDBLP, "dblp_coauth"),
    (CocitationCora, "cora_cocite"),
    (CocitationCiteseer, "citeseer_cocite"),
    (CocitationPubmed, "pubmed_cocite"),
]

for data_func, data_name in data_list:
    nodes_in_edges = set()
    for edge in data_func()["edge_list"]:
        nodes_in_edges.update(edge)
    print(data_name, len(nodes_in_edges), data_func()["num_vertices"])
cora_coauth 2388 2708
dblp_coauth 41302 41302
cora_cocite 1434 2708
citeseer_cocite 1458 3312
pubmed_cocite 3840 19717

Also, for CocitationPubmed, the training set is strangely small, while the val and test sets are identical.

print(CocitationPubmed()["train_mask"].sum())
print(CocitationPubmed()["val_mask"].sum())
print(CocitationPubmed()["test_mask"].sum())
print(torch.all(CocitationPubmed()["val_mask"] == CocitationPubmed()["test_mask"]))

tensor(78)
tensor(19639)
tensor(19639)
tensor(True)