Question on the feature format for generated samples of ENZYMES dataset

harryjo97 / GDSS

Official Code Repository for the paper "Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations" (ICML 2022)

https://arxiv.org/abs/2202.02514

139 stars 22 forks source link

Question on the feature format for generated samples of ENZYMES dataset #11

Open lizaitang opened 1 year ago

lizaitang commented 1 year ago

Dear Author, you paper really helps a lot, but I have a question that I want to pass the generated graph to some classifiers, but seems the node features generated by the GDSS is different from the original dataset in data scale. How can I solve it? Thanks

harryjo97 commented 1 year ago

Hi lizaitang,

In our work, we used the degree of each node as the node features instead of the given node features of the original dataset. In order to use the node features of the original dataset, you can modify the code in https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/utils/data_loader.py#L6 and https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/utils/graph_utils.py#L43 for loading the node features of the dataset. After changing these, you could newly train the score models to generate both node features and the adjacency matrices.

lizaitang commented 1 year ago

Dear Author, thank you very much for your quick reply! How can we modify the code to load node feature of the dataset? Change the init to zeros or ones? Or we dirrectly change x_tensor = init_features(config.data.init, adjs_tensor, config.data.max_feat_num) to the feature of the dataset? def init_features(init, adjs=None, nfeat=10):

if init=='zeros':
    feature = torch.zeros((adjs.size(0), adjs.size(1), nfeat), dtype=torch.float32, device=adjs.device)
elif init=='ones':
    feature = torch.ones((adjs.size(0), adjs.size(1), nfeat), dtype=torch.float32, device=adjs.device)
elif init=='deg':
    feature = adjs.sum(dim=-1).to(torch.long)
    num_classes = nfeat
    try:
        feature = F.one_hot(feature, num_classes=num_classes).to(torch.float32)
    except:
        print(feature.max().item())
        raise NotImplementedError(f'max_feat_num mismatch')
else:
    raise NotImplementedError(f'{init} not implemented')

flags = node_flags(adjs)

return mask_x(feature, flags)

harryjo97 commented 1 year ago

You can change init_features in https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/utils/graph_utils.py#L43 to take in graph_list as input and return the node features. To be specific, each graph in the graph_list is a networkx Graph with node features.

Or you could directly modify x_tensor = init_features(config.data.init, adjs_tensor, config.data.max_feat_num) in https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/utils/data_loader.py#L6 to obtain the original node features from the networkx Graph.

Please refer to the networkx documentation for more details.

FYI, the attributed graphs of the ENZYMES dataset are loaded by this function: https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/data/data_generators.py#LL131C13-L131C13

lizaitang commented 1 year ago

Dear Author, Thank you so much for your quick reply, I have a minor question that I follow the format of graph_to_tensor to load the original node features, but for v, feature in g.nodes.data('feature') gives feature as none, could you please help to fix on it?

def feat_to_tensor(graph_list, max_node_num,max_feat_num):
    feat_list = []
    max_node_num = max_node_num

    for g in graph_list:
        assert isinstance(g, nx.Graph)

        node_feat_list = np.zeros([max_node_num,max_feat_num], dtype = float)
        i=0 
        for v, feature in g.nodes.data('feature'):

            node_feat_list[i]=feature

            i=i+1
        #print(node_feat_list)

        feat_list.append(node_feat_list)

    del graph_list

    feat_np = np.asarray(feat_list)
    del feat_list

    adjs_tensor = torch.tensor(feat_np, dtype=torch.float32)
    del feat_np

    return adjs_tensor

harryjo97 commented 1 year ago

In the graph loader code: https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/data/data_generators.py#L131 The node labels you are looking for are saved in g.nodes.data('label') (saved by Line 158 G.add_node(i + 1, label=data_node_label[i]))

You may want to try g.nodes.data('label') instead of g.nodes.data('feature').

lizaitang commented 1 year ago

Thanks for your reply, but if I want to generate graph with same format node features, shouldn't we use the feature instead of node labels?

harryjo97 commented 1 year ago

I think the node features you want to use for the classifier are contained in the label.

lizaitang commented 1 year ago

Sorry to bother, but I try label, ``` [[2. 2. 2. ... 2. 2. 2.] [2. 2. 2. ... 2. 2. 2.] [2. 2. 2. ... 2. 2. 2.]

harryjo97 commented 1 year ago

First of all, the label contains other values other than 2 (please see https://github.com/harryjo97/GDSS/blob/master/dataset/ENZYMES/ENZYMES_node_labels.txt)

Furthermore, if you want to use the node attributes in https://github.com/harryjo97/GDSS/blob/master/dataset/ENZYMES/ENZYMES_node_attributes.txt, you may change the code in: https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/data/data_generators.py#L266 by setting the node_attributes=True which will load the node attributes file by data_node_att = np.loadtxt(path + name + '_node_attributes.txt', delimiter=',') in https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/data/data_generators.py#L131