`nvrtc: error` when using `GATConv`.

RenzeLou commented 2 years ago

🐛 Bug

Environment

PyG version : 2.0.2
PyTorch version: 1.8.0+cu111
OS (e.g., Linux): Linux (Ubuntu 18.04LTS)
Python version (e.g., 3.9): 3.8.0
CUDA/cuDNN version: 11.4 / 8.2.4
How you installed PyTorch and PyG : pip
Any other relevant information (e.g., version of torch-scatter): torch_scatter-2.0.7; torch_sparse-0.6.9; torch_spline_conv-1.2.1; torch_cluster-1.5.9

Additional context

Dear authors,

Recently, I tried to use “Graph Attention Networks” with PyG, namely the GATConv. However, there is an unexpected error when I try to run it on GPU:

Exception has occurred: RuntimeError
nvrtc: error: failed to open libnvrtc-builtins.so.11.1.
  Make sure that libnvrtc-builtins.so.11.1 is installed correctly.
nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_sub_exp(float* t0, float* t1, float* aten_exp) {
{
  if (512 * blockIdx.x + threadIdx.x<16 ? 1 : 0) {
    float v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
    float v_1 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
    aten_exp[512 * blockIdx.x + threadIdx.x] = expf(v - v_1);
  }
}
}
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/models/dialogue_gcn.py", line 113, in forward
    out_2 = self.conv2(out_1,edge_index)  ## [2*batch_size+batch_cause_num, 600]
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/models/dialogue_gcn.py", line 386, in forward
    word_output = self.graph_network(target_node,cause_node,emotion_node,word_node,word_mask,target_idx, cause_idx)[0]  ## [batch_cause_num, 512, 600]
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/train_GCN.py", line 147, in main
    loss,_,_,_ = model(*batch_input)
  File "/media/ps/data/lourenze/ECD/betav0.1/HCTC/train_GCN.py", line 259, in <module>
    main()

Though I understand this message denotes the incompatible of my libnvrtc-builtins.so (11.4.152), I am pretty confused ... no such error when I use any other class (e.g., GraphConv, RGCNConv).

rusty1s commented 2 years ago

GATConv makes use of our CUDA kernels in torch-scatter, while some GNN ops like GraphConv and RGCNConv are implemented in pure PyTorch.

As such, it looks like you have installed torch-scatter with CUDA 11.1 wheels, while your local CUDA version points to 11.4.

RenzeLou commented 2 years ago

Thanks a lot for your kind reply! I will try to solve it.

RenzeLou commented 2 years ago

I have solved this problem by installing multiple cudatoolkits.

Just follow this blog, and install the cuda which is compatible with your torch-scatter: https://towardsdatascience.com/installing-multiple-cuda-cudnn-versions-in-ubuntu-fcb6aa5194e2

Then you can use different versions of cudatoolkits, and there is no need to change any of your environment.

pyg-team / pytorch_geometric