pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.16k stars 3.64k forks source link

`nvrtc: error` when using `GATConv`. #3723

Closed RenzeLou closed 2 years ago

RenzeLou commented 2 years ago

🐛 Bug

Environment

Additional context

Dear authors,

Recently, I tried to use “Graph Attention Networks” with PyG, namely the GATConv. However, there is an unexpected error when I try to run it on GPU:

Exception has occurred: RuntimeError
nvrtc: error: failed to open libnvrtc-builtins.so.11.1.
  Make sure that libnvrtc-builtins.so.11.1 is installed correctly.
nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_sub_exp(float* t0, float* t1, float* aten_exp) {
{
  if (512 * blockIdx.x + threadIdx.x<16 ? 1 : 0) {
    float v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
    float v_1 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
    aten_exp[512 * blockIdx.x + threadIdx.x] = expf(v - v_1);
  }
}
}
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/models/dialogue_gcn.py", line 113, in forward
    out_2 = self.conv2(out_1,edge_index)  ## [2*batch_size+batch_cause_num, 600]
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/models/dialogue_gcn.py", line 386, in forward
    word_output = self.graph_network(target_node,cause_node,emotion_node,word_node,word_mask,target_idx, cause_idx)[0]  ## [batch_cause_num, 512, 600]
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/train_GCN.py", line 147, in main
    loss,_,_,_ = model(*batch_input)
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/train_GCN.py", line 259, in <module>
    main()

Though I understand this message denotes the incompatible of my libnvrtc-builtins.so (11.4.152), I am pretty confused ... no such error when I use any other class (e.g., GraphConv, RGCNConv).

rusty1s commented 2 years ago

GATConv makes use of our CUDA kernels in torch-scatter, while some GNN ops like GraphConv and RGCNConv are implemented in pure PyTorch.

As such, it looks like you have installed torch-scatter with CUDA 11.1 wheels, while your local CUDA version points to 11.4.

RenzeLou commented 2 years ago

Thanks a lot for your kind reply! I will try to solve it.

RenzeLou commented 2 years ago

I have solved this problem by installing multiple cudatoolkits.

Just follow this blog, and install the cuda which is compatible with your torch-scatter: https://towardsdatascience.com/installing-multiple-cuda-cudnn-versions-in-ubuntu-fcb6aa5194e2

Then you can use different versions of cudatoolkits, and there is no need to change any of your environment.