ma-compbio / Hyper-SAGNN

hypergraph representation learning, graph neural network
MIT License
86 stars 21 forks source link

Some errors running the code #7

Closed sml0399 closed 3 years ago

sml0399 commented 3 years ago

Hi, thanks for sharing your great code. I encountered some errors while running the code. I used python 3.7.10 with tensorflow 1.14.0 and pytorch 1.9.0 in anaconda3 environment (cuda 11.1 with RTX 3090) (Summary of my question is written at the last part of the issue.)

At Code directory, I executed the following python main.py --data MovieLens -f walk

This returned the following error:

torch.Size([15594, 64])
Traceback (most recent call last):
  File "main.py", line 732, in <module>
    summary(classifier_model, (3,))
  File "/home/user/Desktop/ab/Hyper-SAGNN/Code/torchsummary.py", line 73, in summary
    model(*x)
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/Desktop/ab/Hyper-SAGNN/Code/Modules.py", line 272, in forward
    dynamic, static, attn = self.get_embedding(x, slf_attn_mask, non_pad_mask, return_recon)
  File "/home/user/Desktop/ab/Hyper-SAGNN/Code/Modules.py", line 241, in get_embedding
    dynamic, static, attn = self.encode1(x, x, slf_attn_mask, non_pad_mask)
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/user/Desktop/ab/Hyper-SAGNN/Code/Modules.py", line 614, in forward
    dynamic = self.pff_n1(dynamic * non_pad_mask) * non_pad_mask
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/user/Desktop/ab/Hyper-SAGNN/Code/Modules.py", line 355, in forward
    output = self.w_stack[i](output)
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 298, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 295, in _conv_forward
    self.padding, self.dilation, self.groups)
TypeError: conv1d() received an invalid combination of arguments - got (Tensor, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (Tensor, Parameter, Parameter, tuple, tuple, tuple, int)
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (Tensor, Parameter, Parameter, tuple, tuple, tuple, int)

This error seems to be happening at line 337 at Modules.py "self.w_stack.append(nn.Conv1d(dims[i], dims[i + 1], 1, use_bias))" so I changed this to "self.w_stack.append(nn.Conv1d(dims[i], dims[i + 1], 1, bias= use_bias))". (This removed error but I'm not sure this is right solution.)

After doing this, another error happened:

Data file:  ../walks/MovieLens/p2_q0.25_r10_l40_walks.txt
Vocab size:  15593  + UNK
Words per epoch:  6237200
params to be trained 2070189
[ Epoch 0 of 300 ]
Traceback (most recent call last):                                                                                                                                                                                               
  File "main.py", line 761, in <module>
    optimizer=[optimizer], epochs=300, batch_size=batch_size, only_rw=False)
  File "main.py", line 301, in train
    args, model, loss, training_data, optimizer, batch_size, only_rw, train_type)
  File "main.py", line 205, in train_epoch
    pred, batch_y, loss_bce, loss_recon = train_batch_hyperedge(model_1, loss_1, batch_edge, batch_edge_weight, type, y=batch_y)
  File "main.py", line 115, in train_batch_hyperedge
    x, y, w = generate_negative(x, "train_dict", type, w)
  File "main.py", line 448, in generate_negative
    x = pad_sequence(x, batch_first=True, padding_value=0).to(device)
  File "/home/user/anaconda3/envs/hyper_sagnn/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 363, in pad_sequence
    return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
TypeError: pad_sequence(): argument 'sequences' (position 1) must be tuple of Tensors, not Tensor

This error seems to be happening at line 24 at utils.py "return torch.as_tensor(vec, dtype = dtype)" so I changed this to " return [torch.as_tensor(v, dtype=dtype) for v in vec]". After doing this, no further error occured during execution. But the training results(attached as Result_MovieLens_walk.txt) shows that "skipgram: 0.0000, recon: 0.0000". I'm not sure this is right result. Also AUC and AUPR values seems to be much lower than the result in the paper. I'm worried because there can be more parts that does not work as intended but not detected as errors.

So my question is... (As some errors occured with my environment) Would you check whether code works properly as intended with the environment? Or can you let me know the exact environment(with detailed version) that works well as intended without errors? Also, I think it would be helpful if you let me know the examplary result of "python main.py --data MovieLens -f walk" like attached Result_MovieLens_walk.txt file. It will be helpful to check whether the code is running well.

Result_MovieLens_walk.txt

ruochiz commented 3 years ago

Hi,

Thanks for your interest in our work. Your way to fix the first error is correct. This bug occurs since the pytorch 1.9.0 (until 1.8.1 it's still fine. I encountered the same bug in another project I worked on that uses HyperSAGNN). I'll push an updated version to fix that.

For the second error, I think your way of fixing it seems reasonable to me as well. However, if you are mainly running this on the 4 triplet datasets. You can commented out the pad_sequence function as well. The pad_sequence() function is used for dealing with non-uniform hypergraphs where hyperedges of different sizes are padded with 0 to reach the same size.

As for your question on the environment we tested: the original code is tested on pytorch 1.4.0.

Now for why the performance is changed: I can see that your attached results reached 0.793 aupr, which is slightly lower than the 0.810 we reported, and then due to overfits dropped to around 0.72. Possible reasons are:

  1. The code has been changed as compared to the original version. In the original version, the modified node2vec(randomwalk + word2vec) is used to generate embeddings as initialization, and then the node2vec loss is jointly optimized during the process (thus the skipgram should not be zero). However, that node2vec loss depends on the tensorflow implementation of random sampling for word2vec model and requires compiling of a .so file. We have received some feedbacks that it's a complicated process and decided to remove the dependencies of this .so file and the corresponding loss. (As you can see I recently updated a main_pytorch.py to get rid of tensorflow totally as tf2.0 changes most of the API in TF1.0). That can be a potential reason for a slightly drop of theperformance.

  2. Another thing that I suspect that changes the performance is the types of walk. In the paper the performance is reported with our modified random walk. The code in this repo would in default use the normal random walk. To use our modified one, you can pass the "-w hyper" parameter too (I should have include the description of this parameter in the README as well).

  3. Changes of gensim which initialize the node2vec embeddings.

Overall, the -f adj options yield more stable results than the -f walk options as it depends less on other libraries.