ma-compbio / Hyper-SAGNN

hypergraph representation learning, graph neural network
MIT License
83 stars 21 forks source link

How to reproduce the results and visualizations that are in paper #6

Closed ans92 closed 3 years ago

ans92 commented 3 years ago

Hi, Thank you for the great code. I can successfully run the code but it doesn't show any output as like in the paper. Therefore I want to know how can I reproduce the results and visualizations as shown in the paper. Your guidance would be highly appreciated.

ruochiz commented 3 years ago

The code and data provided in the repo should be able to reproduce the reported results on the 4 datasets we studied in the paper. Could you provide more details on how you run the code (the command and parameters), which dataset you are trying, and specifically which result you have trouble reproducing?

ans92 commented 3 years ago

Thank you @ruochiz for reply. I ran following command: !python /content/drive/MyDrive/Hyper-SAGNN-master/Hyper-SAGNN-master/Code/main.py --data drug -f adj

We ran it in google colab. I want that it reproduces the results as shown in the table in the paper and also produce graphs that are given in results portion and also at the end in Appendix.

Please let me know if you know other details. And also if it is possible for you to add more comments in the code in order to make it beginner and student friendly then we will be able to grab code more easily and also able to understand it more. Thank you.

ans92 commented 3 years ago

Hi @ruochiz , You haven't replied on my comment. Can you please guide me how can I reproduce the results of this paper. Or is there something that I am doing wrong?

ruochiz commented 3 years ago

There are three tasks we discussed in the paper: The way you are running the code seem correct.

  1. network reconstruction, which is the training AUC/AUPR scores displayed during the training process.
  2. link prediction (hyperedge prediction), which is roughly the validation AUC/AUPR scores printed during the training process.
  3. Node classification. We use the same dataset as the DHNE (https://github.com/tadpole/DHNE). The label information can be accessed within the .npz file. As described in the paper, we use simple Logistic Regression as the classifier. I attached the code snippet we used for node classification on MovieLens for your reference:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import average_precision_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_auc_score, accuracy_score, matthews_corrcoef
from sklearn.preprocessing import MultiLabelBinarizer,StandardScaler
from sklearn.multiclass import OneVsRestClassifier,OneVsOneClassifier
from sklearn.linear_model import LogisticRegression

import warnings

warnings.filterwarnings("ignore")

def main():
    embedding_my = np.load("../mymodel_1.npy", allow_pickle=True)

    train_data = np.load("../DHNE/data/MovieLens/train_data.npz", allow_pickle=True)
    label = train_data['labels']
    m = MultiLabelBinarizer().fit(label)
    label =  m.transform(label)

    index = (np.sum(label, axis=-1) > 0)

    embedding_my = embedding_my[index]
    label = label[index]

    print('train_num', 'micro', 'macro')
    for i in range(9):
        train_num = (i + 1) * 0.1

        X1_train, X1_test, y_train, y_test = train_test_split( embedding_my, label,
                                                                                 test_size=1 - train_num, random_state=42)

        str1 = 'our'
        train = X2_train
        test = X2_test
        clf = OneVsRestClassifier(LogisticRegression())

        normalizer = StandardScaler().fit(train)
        train = normalizer.transform(train)
        test = normalizer.transform(test)
        clf.fit(train, y_train)

        y_pred = clf.predict(test)
        micro, macro = f1_score(y_test, y_pred, average='micro'), f1_score(y_test, y_pred, average='macro')
        print("%.2f\t%.3f\t%.3f" % (train_num, micro, macro))

    return micro, macro

If you have a specific question, could you point to which table or figure that you are having trouble with?

ans92 commented 3 years ago

Thank you @ruochiz . Can you please let me know the loss function used in the this paper as I am not able to find the loss function in the paper.

ruochiz commented 3 years ago

Binary cross entropy loss where the positive samples are observed hyperedges and the negative samples are random node tuples that are not observed as hyperedges

ans92 commented 3 years ago

Ok Thank you again for your response. When I run following command it gives me error: !python /content/drive/MyDrive/Hyper-SAGNN-master/Hyper-SAGNN-master/Code/main.py --data MovieLens -f adj

The error is as follows: 2021-06-27 07:14:56.241406: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 1.0 -0.5 model_no_randomwalk no specific train weight Node type num [146 70 5] [5. 5. 5. ... 5. 5. 5.] [5. 5. 5. ... 5. 5. 5.] 5.0 5.0 walk type adding pad idx dict_size 1154 1436 train data amount 1154 Traceback (most recent call last): File "/content/drive/MyDrive/Hyper-SAGNN-master/Hyper-SAGNN-master/Code/main.py", line 638, in with tf.Graph().as_default(), tf.Session() as session: AttributeError: module 'tensorflow' has no attribute 'Session'

Can you help me why this happens. This happened with GPS dataset. Wordnet do not show this error. Thank you for your help.

ruochiz commented 3 years ago

I'm guessing you are using tf2.0 or higher? The API of tensorflow is changed since tf2.0, things like session() no longer exists. If you just use the -f adj mode, I can upload a newer version without using tensorflow later this week.

ans92 commented 3 years ago

Ok Thank you

ans92 commented 3 years ago

Hi @ruochiz, I am a student and have limited knowledge about this stuff. Therefore can you please explain me why we are taking negative samples along with positive samples and why we are taking both static embedding and dynamic embedding. I have read the paper but couldn't understand properly why we are having static and dynamic embedding.

And another thing that I want to ask is that in the paper it has been written that this new model can work on variable-sized hyperedge. But I have seen all the datasets and found that all those datasets have three nodes hyperedge both in training and testing. So how you have verified that it can work on variable sized hyperedge

Another thing that I have found that only wordnet dataset and movieLens dataset have labels and index labels. GPS and drug dataset does not have labels. And those that have labels are also not available for all the hyperedges. For example there are 117153 hypreedges in training data of wordnet but only 40750 labels for hyperedges in training data. Can you please help me understand all the above points. I would be very thank you for your time and help.

ruochiz commented 3 years ago
  1. The model is trained with a binary classification loss (BCE). Thus it needs both positive and negative samples to train the model. The model is essentially trained to distinguish observed hyperedges versus random combinations of nodes. Similar techniques (negative sampling) have been used in word2vec, node2vec etc.
  2. The static embeddings reflect the general properties of a node, while the dynamic ones reflect the node's property in a specific hyperedge. The assumption is that for a hyperedge to be "stable" the nodes dynamic properties should not be too far away from its static one. (An analog would be, the static embedding is a person's general personality, and the dynamic embedding reflects what a person does when in a specific friend group. The assumption is that the friendship is stable only when these two are not that different.)
  3. We tested it on datasets with hyperedges of size 2 and 3 in the HyperSAGNN paper.
  4. those labels are not for the hyperedges, they are for the nodes.
ans92 commented 3 years ago

Ok Thank you @ruochiz . Your answers were great and helped me a lot. If you allow I want to ask one more thing that in the paper it has been written that if we also include first-order neighbors for information aggregation process before calculating static and dynamic embedding then our results can be improved. I want to ask how can I know that lets say node A and B and first neighbors of node C. As currently the training dataset has tuples containing nodes so in order to get first-order neighbors of nodes can I use the same dataset or do I need completely different form of dataset? If you can clear this point to me that would be very great. I want to say thank you again for your extreme help.

ruochiz commented 3 years ago

You can construct a graph based on the hypergraph via decomposition or find another graph that contains these nodes as well.

ans92 commented 3 years ago

Thank you @ruochiz . Your first idea seems betters to me. Can you share more details about how can I construct a graph based on hypergraph by decomposing it. I mean some library or other resource. Thank you.

ruochiz commented 3 years ago

The decomposition of a hypergraph can be pretty simple. For instance, just decompose a hyperedge connecting (n1, n2, n3) into three edges (e1, e2), (e2, e3), (e1, e3).

ans92 commented 3 years ago

Yes you are right. But I have read in the paper that not all the hyperedges are decomposable into pair-wise edges. In the introduction section it has been mentioned by referencing DHNE (Deep Hyper-Network Embedding) paper that DHNE paper suggested the existence of heterogeneous indecomposable hyperedges. So do you think this method of dcomposition of hypergraph into pairwise edges will work?

And second thing is in the paper it has been written that we need to aggregate information over all the first-order neighbor before calculating static/dynamic embedding of node. So what does this mean by information aggregation process?

Again thank you for you help and effort.

ans92 commented 3 years ago

Another thing that I want to ask is about results of current model. I have run the GPS dataset with random walk and found this output: gps random walk

And I got following output with adj (encoder method) with GPS dataset: gps econder based

The question I want to ask is about the output that you have reported in the paper. How you calculate the AUC and AUPR values that you have written in the table in the paper? Have you taken last epoc results or average of all the epocs and have you considered training results or validation results?

In the paper you have shown two tables. Table 1 is for AUC and AUPR values for network reconstruction. And Table 2 is for Performance evaluation based on AUROC and AUPR for hyperedge/edge prediction. Now I have run the model and got above results. Are above results for table 1 (network reconstruction) or for table 2 (hyperedge/edge prediction).

I am very thankful for your help and support.

ruochiz commented 3 years ago
  1. When we said some hyperedges are not decomposable, we mean they are not decomposable when you are trying to studying the higher-order information. But if you are just finding the first order neighbors of nodes, then decomposition would be a good start. Of course, there can be other methods to do so.
  2. I cannot give you detailed research advice. But this has been studied by other works on graphs and hypergraphs.
  3. Network reconstruction - training AUC/AUPR. link predict - validation AUC/AUPR. The experiment settings are kept as the same as the DHNE paper.
ans92 commented 3 years ago

Ok great. Thanks for your reply. Can you please clear third point that how you filled values in the table. I mean do you take average of AUC/AUPR over 300 epocs or have you taken AUC/AUPR of last epoc or some other method? Thank you

ans92 commented 3 years ago

@ruochiz Sir, If you are comfortable can I ask you few more questions regarding code of this paper and output that functions output? Thank you.

ans92 commented 3 years ago

Hi @ruochiz, following are two lines from main.py train_weight = train_weight / train_weight_mean * neg_num test_weight = test_weight / train_weight_mean * neg_num These are 586 and 587 I guess. The confusion I have is output of train_weight and test_weight has not been changed after these lines. I think its because weights are first divided and then multiplied by neg_num. When they multiplied with negative num then they get their original value back. So don't you think there are brackets missing there so in order to change the weight. Currently train _matrix is [5. 5. ...... 5.] and test_matrix is [1. 1. ..... 1.] before and after this operation.

ruochiz commented 3 years ago
  1. For all the methods, during training, we kept the parameters that yields the best performance on the test set. We report the train/test AUC/AUPR with that set of parameters. The process is repeated for 10 times and we reported the average of this 10 times.
  2. The train/test weights are used as "sample importance". Thus there is no error here. (because there are more negative samples, the models are trained to regarded positive samples equally important. But during testing, we don't need to reweight the pos/neg samples)

I understand that you perhaps don't have much experiences with this field. I would suggest you get familiar with some of the deep learning and graph representation learning basics first.

ans92 commented 3 years ago

Thank you for your reply. Yes definitely I will look into basics of graph representation learning basics. But even then you are the best person to ask questions regarding this paper and its code. And I really appreciate your help and cooperation. Thanks again.

I have 2 questions regarding output of the code. First is what this output means? It is the output of generate_H function:

H[0]: (0, 54) 2.236068 (0, 105) 2.236068 (0, 213) 2.236068 (0, 550) 2.236068 (1, 130) 2.236068 (1, 245) 2.236068 ....... (144, 584) 2.236068 (144, 1049) 2.236068 (145, 665) 2.236068

I know that first column represents the node labels, second column represents the num from input length and third column represents the sqrt of weight. But I couldn't understand the idea behind their representation and what they are telling. Then this output converts to following embeddings:

(0, 220) 5.0 (0, 219) 5.0 (0, 153) 15.0 (0, 150) 10.0 (1, 157) 10.0 (1, 220) 15.0 ....... (144, 147) 5.0 (144, 144) 15.0 (145, 220) 5.0 (145, 164) 5.0

Their size has been reduced from 1154 to 221. I know their size has been reduced due to dot product. But why we need to reduce their size and now what the output is referencing to?

In next step we get initial embeddings as: (0, 74) 0.05263158 (0, 73) 0.1 (0, 7) 1.0 (1, 72) 0.14285715 (1, 11) 1.0 ....... (144, 70) 0.09090909 (144, 1) 1.0 (145, 74) 0.05263158 (145, 18) 1.0

Now what does this initial embeddings representing as their size is decreased further. What I understood is that each row representing the embedding of single node. But if this is the case then why we are having multiple embeddings for same node? Is this for calculating dynamic embeddings? Can you please make these things clear?

And my second question is you have changed columns of training data for multiple node types. Our initial training data was: train_data [[ 93 57 4] [ 17 7 0] [ 26 13 4] ... [108 15 1] [113 1 0] [141 41 3]] }

But you have changed it as below for multiple node types: train_data [[ 93 203 220] [ 17 153 216] [ 26 159 220] ... [108 161 217] [113 147 216] [141 187 219]] } I can see that only second and third columns have been changed and first column remains unchanged. Can you also please explain why you have made this change and what will be benefit of this change?

And lastly you have referred to test data. I can see test_data in .npz files of datasets and also in the code. But do I need to do some change in coding in order to get test accuracy or it is already showing in the form of validation results. Because I can see the results of training and validation. So I am assuming that those validation results are for test_data. Right? Or do I need to run this model separately on test_data? I would be very thankful for your help.

ans92 commented 3 years ago

Thank you @ruochiz for your comment on my previous question regarding calculating results. You said that you run the model on best parameters and then took average of train and test results. I want to clear one point here that is confusing me. Currently during training there are 300 epochs. So first time you train the model you will get 300 AUC/AUPR values. Here is my confusion point. Do I have to average all those 300 AUC/AUPR values to get the AUC/AUPR value of first training experiment. And then similarly average all 300 AUC/AUPR values to get final value of AUC/AUPR of second experiment and so on? Is this is the way to calculate the AUC/AUPR values for training phase?

And according to my knowledge there are no epochs in test phase so there is single output value of AUC/AUPR each time. Am I right?

ruochiz commented 3 years ago
  1. The output is the sparse matrix representation (check scipy.sparse.coo). It corresponds to (row, col, val) of all nonzero values. It's an adjacency matrix of a node with the other nodes (check the DHNE paper, they have a more detailed description of why and how to get this adjacency matrix)
  2. Because the node embeddings are generated from a dictionary. In the original format there are node 0 in both the first / second and third columns, corresponding to three node types. So to make it less confusing for the model, we added some constant to the second and third columns such that there will not be overlapping node ids across different node types.
  3. valid is test in this case. There is no averaging over 300. We kept the setting same as DHNE as I mentioned. You find the epoch with the best valid acc. You kept the model, and record the AUC/AUPR of training and test. We repeat this process for multiple times and average the AUC/AUPR of these repeated experiments.

I would suggest you go through the DHNE paper first. We have a part of the structure constructed based on that and it is the main baseline method we compared to. It would be a lot helpful if you can understanding the structure and experiment setting of DHNE.

ans92 commented 3 years ago

Hi @ruochiz, Can you please tell me what is recon_loss in get_node_embedding function of classifier class? Thank you.

ans92 commented 3 years ago

Hi @ruochiz . I have read that in DHNE paper that recon_loss is reconstruction loss. Yes that paper is helpful. Thank you.

ans92 commented 3 years ago

Hi @ruochiz, I have read DHNE paper and found one thing that I haven't read in this Hyper-SAGNN paper. In DHNE paper they have used first order proximity and second order proximity. In first order proximity they measures the similarity between the nodes. It means similar nodes will more likely to form a hyperedge. In second order proximity they preserved the relationship of nodes with their neighbors. That is what I want to ask you. Is second order proximity also fulfilled in this Hyper-SAGNN paper. If yes then how second order proximity is being fulfilled here?

And second thing I want to ask you is that it is mentioned in the paper that if we do information aggregation over first order neighbors then we can have improved link prediction results. I want to ask what would be the flow in that situation. Currently our flow is we get a tuple of node lets say [a,b,c]. Then we extract features of each node. Then we calculate static and dynamic embedding of each node based on its features. Then use neural network to get probabilities. So if we add information aggregation functionality then at what level we can add it. In the paper it has been written that we can do it before calculating dynamic and static embedding. So I am assuming we can do it during feature extraction stage. If you can say few words on it that would be great. Thank you.

ans92 commented 3 years ago

@ruochiz , Can you please tell me why you are using two losses in the Hyper-SAGNN model? I can see that one is BCE loss and other is BCEwithlogits. So can you please guide me why you are using two losses here. Thank you.