Closed ans92 closed 3 years ago
The code and data provided in the repo should be able to reproduce the reported results on the 4 datasets we studied in the paper. Could you provide more details on how you run the code (the command and parameters), which dataset you are trying, and specifically which result you have trouble reproducing?
Thank you @ruochiz for reply. I ran following command:
!python /content/drive/MyDrive/Hyper-SAGNN-master/Hyper-SAGNN-master/Code/main.py --data drug -f adj
We ran it in google colab. I want that it reproduces the results as shown in the table in the paper and also produce graphs that are given in results portion and also at the end in Appendix.
Please let me know if you know other details. And also if it is possible for you to add more comments in the code in order to make it beginner and student friendly then we will be able to grab code more easily and also able to understand it more. Thank you.
Hi @ruochiz , You haven't replied on my comment. Can you please guide me how can I reproduce the results of this paper. Or is there something that I am doing wrong?
There are three tasks we discussed in the paper: The way you are running the code seem correct.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import average_precision_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_auc_score, accuracy_score, matthews_corrcoef
from sklearn.preprocessing import MultiLabelBinarizer,StandardScaler
from sklearn.multiclass import OneVsRestClassifier,OneVsOneClassifier
from sklearn.linear_model import LogisticRegression
import warnings
warnings.filterwarnings("ignore")
def main():
embedding_my = np.load("../mymodel_1.npy", allow_pickle=True)
train_data = np.load("../DHNE/data/MovieLens/train_data.npz", allow_pickle=True)
label = train_data['labels']
m = MultiLabelBinarizer().fit(label)
label = m.transform(label)
index = (np.sum(label, axis=-1) > 0)
embedding_my = embedding_my[index]
label = label[index]
print('train_num', 'micro', 'macro')
for i in range(9):
train_num = (i + 1) * 0.1
X1_train, X1_test, y_train, y_test = train_test_split( embedding_my, label,
test_size=1 - train_num, random_state=42)
str1 = 'our'
train = X2_train
test = X2_test
clf = OneVsRestClassifier(LogisticRegression())
normalizer = StandardScaler().fit(train)
train = normalizer.transform(train)
test = normalizer.transform(test)
clf.fit(train, y_train)
y_pred = clf.predict(test)
micro, macro = f1_score(y_test, y_pred, average='micro'), f1_score(y_test, y_pred, average='macro')
print("%.2f\t%.3f\t%.3f" % (train_num, micro, macro))
return micro, macro
If you have a specific question, could you point to which table or figure that you are having trouble with?
Thank you @ruochiz . Can you please let me know the loss function used in the this paper as I am not able to find the loss function in the paper.
Binary cross entropy loss where the positive samples are observed hyperedges and the negative samples are random node tuples that are not observed as hyperedges
Ok Thank you again for your response. When I run following command it gives me error:
!python /content/drive/MyDrive/Hyper-SAGNN-master/Hyper-SAGNN-master/Code/main.py --data MovieLens -f adj
The error is as follows:
2021-06-27 07:14:56.241406: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
1.0 -0.5
model_no_randomwalk
no specific train weight
Node type num [146 70 5]
[5. 5. 5. ... 5. 5. 5.]
[5. 5. 5. ... 5. 5. 5.] 5.0 5.0
walk type
adding pad idx
dict_size 1154 1436
train data amount 1154
Traceback (most recent call last):
File "/content/drive/MyDrive/Hyper-SAGNN-master/Hyper-SAGNN-master/Code/main.py", line 638, in
Can you help me why this happens. This happened with GPS dataset. Wordnet do not show this error. Thank you for your help.
I'm guessing you are using tf2.0 or higher? The API of tensorflow is changed since tf2.0, things like session() no longer exists. If you just use the -f adj mode, I can upload a newer version without using tensorflow later this week.
Ok Thank you
Hi @ruochiz, I am a student and have limited knowledge about this stuff. Therefore can you please explain me why we are taking negative samples along with positive samples and why we are taking both static embedding and dynamic embedding. I have read the paper but couldn't understand properly why we are having static and dynamic embedding.
And another thing that I want to ask is that in the paper it has been written that this new model can work on variable-sized hyperedge. But I have seen all the datasets and found that all those datasets have three nodes hyperedge both in training and testing. So how you have verified that it can work on variable sized hyperedge
Another thing that I have found that only wordnet dataset and movieLens dataset have labels and index labels. GPS and drug dataset does not have labels. And those that have labels are also not available for all the hyperedges. For example there are 117153 hypreedges in training data of wordnet but only 40750 labels for hyperedges in training data. Can you please help me understand all the above points. I would be very thank you for your time and help.
Ok Thank you @ruochiz . Your answers were great and helped me a lot. If you allow I want to ask one more thing that in the paper it has been written that if we also include first-order neighbors for information aggregation process before calculating static and dynamic embedding then our results can be improved. I want to ask how can I know that lets say node A and B and first neighbors of node C. As currently the training dataset has tuples containing nodes so in order to get first-order neighbors of nodes can I use the same dataset or do I need completely different form of dataset? If you can clear this point to me that would be very great. I want to say thank you again for your extreme help.
You can construct a graph based on the hypergraph via decomposition or find another graph that contains these nodes as well.
Thank you @ruochiz . Your first idea seems betters to me. Can you share more details about how can I construct a graph based on hypergraph by decomposing it. I mean some library or other resource. Thank you.
The decomposition of a hypergraph can be pretty simple. For instance, just decompose a hyperedge connecting (n1, n2, n3) into three edges (e1, e2), (e2, e3), (e1, e3).
Yes you are right. But I have read in the paper that not all the hyperedges are decomposable into pair-wise edges. In the introduction section it has been mentioned by referencing DHNE (Deep Hyper-Network Embedding) paper that DHNE paper suggested the existence of heterogeneous indecomposable hyperedges. So do you think this method of dcomposition of hypergraph into pairwise edges will work?
And second thing is in the paper it has been written that we need to aggregate information over all the first-order neighbor before calculating static/dynamic embedding of node. So what does this mean by information aggregation process?
Again thank you for you help and effort.
Another thing that I want to ask is about results of current model. I have run the GPS dataset with random walk and found this output:
And I got following output with adj (encoder method) with GPS dataset:
The question I want to ask is about the output that you have reported in the paper. How you calculate the AUC and AUPR values that you have written in the table in the paper? Have you taken last epoc results or average of all the epocs and have you considered training results or validation results?
In the paper you have shown two tables. Table 1 is for AUC and AUPR values for network reconstruction. And Table 2 is for Performance evaluation based on AUROC and AUPR for hyperedge/edge prediction. Now I have run the model and got above results. Are above results for table 1 (network reconstruction) or for table 2 (hyperedge/edge prediction).
I am very thankful for your help and support.
Ok great. Thanks for your reply. Can you please clear third point that how you filled values in the table. I mean do you take average of AUC/AUPR over 300 epocs or have you taken AUC/AUPR of last epoc or some other method? Thank you
@ruochiz Sir, If you are comfortable can I ask you few more questions regarding code of this paper and output that functions output? Thank you.
Hi @ruochiz, following are two lines from main.py
train_weight = train_weight / train_weight_mean * neg_num test_weight = test_weight / train_weight_mean * neg_num
These are 586 and 587 I guess. The confusion I have is output of train_weight and test_weight has not been changed after these lines. I think its because weights are first divided and then multiplied by neg_num. When they multiplied with negative num then they get their original value back. So don't you think there are brackets missing there so in order to change the weight. Currently train _matrix is [5. 5. ...... 5.] and test_matrix is [1. 1. ..... 1.] before and after this operation.
I understand that you perhaps don't have much experiences with this field. I would suggest you get familiar with some of the deep learning and graph representation learning basics first.
Thank you for your reply. Yes definitely I will look into basics of graph representation learning basics. But even then you are the best person to ask questions regarding this paper and its code. And I really appreciate your help and cooperation. Thanks again.
I have 2 questions regarding output of the code. First is what this output means? It is the output of generate_H function:
H[0]: (0, 54) 2.236068 (0, 105) 2.236068 (0, 213) 2.236068 (0, 550) 2.236068 (1, 130) 2.236068 (1, 245) 2.236068 ....... (144, 584) 2.236068 (144, 1049) 2.236068 (145, 665) 2.236068
I know that first column represents the node labels, second column represents the num from input length and third column represents the sqrt of weight. But I couldn't understand the idea behind their representation and what they are telling. Then this output converts to following embeddings:
(0, 220) 5.0 (0, 219) 5.0 (0, 153) 15.0 (0, 150) 10.0 (1, 157) 10.0 (1, 220) 15.0 ....... (144, 147) 5.0 (144, 144) 15.0 (145, 220) 5.0 (145, 164) 5.0
Their size has been reduced from 1154 to 221. I know their size has been reduced due to dot product. But why we need to reduce their size and now what the output is referencing to?
In next step we get initial embeddings as: (0, 74) 0.05263158 (0, 73) 0.1 (0, 7) 1.0 (1, 72) 0.14285715 (1, 11) 1.0 ....... (144, 70) 0.09090909 (144, 1) 1.0 (145, 74) 0.05263158 (145, 18) 1.0
Now what does this initial embeddings representing as their size is decreased further. What I understood is that each row representing the embedding of single node. But if this is the case then why we are having multiple embeddings for same node? Is this for calculating dynamic embeddings? Can you please make these things clear?
And my second question is you have changed columns of training data for multiple node types. Our initial training data was: train_data [[ 93 57 4] [ 17 7 0] [ 26 13 4] ... [108 15 1] [113 1 0] [141 41 3]] }
But you have changed it as below for multiple node types: train_data [[ 93 203 220] [ 17 153 216] [ 26 159 220] ... [108 161 217] [113 147 216] [141 187 219]] } I can see that only second and third columns have been changed and first column remains unchanged. Can you also please explain why you have made this change and what will be benefit of this change?
And lastly you have referred to test data. I can see test_data in .npz files of datasets and also in the code. But do I need to do some change in coding in order to get test accuracy or it is already showing in the form of validation results. Because I can see the results of training and validation. So I am assuming that those validation results are for test_data. Right? Or do I need to run this model separately on test_data? I would be very thankful for your help.
Thank you @ruochiz for your comment on my previous question regarding calculating results. You said that you run the model on best parameters and then took average of train and test results. I want to clear one point here that is confusing me. Currently during training there are 300 epochs. So first time you train the model you will get 300 AUC/AUPR values. Here is my confusion point. Do I have to average all those 300 AUC/AUPR values to get the AUC/AUPR value of first training experiment. And then similarly average all 300 AUC/AUPR values to get final value of AUC/AUPR of second experiment and so on? Is this is the way to calculate the AUC/AUPR values for training phase?
And according to my knowledge there are no epochs in test phase so there is single output value of AUC/AUPR each time. Am I right?
I would suggest you go through the DHNE paper first. We have a part of the structure constructed based on that and it is the main baseline method we compared to. It would be a lot helpful if you can understanding the structure and experiment setting of DHNE.
Hi @ruochiz, Can you please tell me what is recon_loss in get_node_embedding function of classifier class? Thank you.
Hi @ruochiz . I have read that in DHNE paper that recon_loss is reconstruction loss. Yes that paper is helpful. Thank you.
Hi @ruochiz, I have read DHNE paper and found one thing that I haven't read in this Hyper-SAGNN paper. In DHNE paper they have used first order proximity and second order proximity. In first order proximity they measures the similarity between the nodes. It means similar nodes will more likely to form a hyperedge. In second order proximity they preserved the relationship of nodes with their neighbors. That is what I want to ask you. Is second order proximity also fulfilled in this Hyper-SAGNN paper. If yes then how second order proximity is being fulfilled here?
And second thing I want to ask you is that it is mentioned in the paper that if we do information aggregation over first order neighbors then we can have improved link prediction results. I want to ask what would be the flow in that situation. Currently our flow is we get a tuple of node lets say [a,b,c]. Then we extract features of each node. Then we calculate static and dynamic embedding of each node based on its features. Then use neural network to get probabilities. So if we add information aggregation functionality then at what level we can add it. In the paper it has been written that we can do it before calculating dynamic and static embedding. So I am assuming we can do it during feature extraction stage. If you can say few words on it that would be great. Thank you.
@ruochiz , Can you please tell me why you are using two losses in the Hyper-SAGNN model? I can see that one is BCE loss and other is BCEwithlogits. So can you please guide me why you are using two losses here. Thank you.
Hi, Thank you for the great code. I can successfully run the code but it doesn't show any output as like in the paper. Therefore I want to know how can I reproduce the results and visualizations as shown in the paper. Your guidance would be highly appreciated.