cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

micaeltchapmi commented 5 years ago

Hi,

I come across this error when training. It used to work before after I had installed all required packages in my anaconda environment but it won't work now and I'm not sure what changes I made

cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

I created a new environment and installed all packages again but I can't seem to get rid of this error. Any clue on what might be going on? I'm using cuda9.0, cupy5.3 and pytorch 0.3.1

Here is the stack trace:

File "learning/main.py", line 388, in main() File "learning/main.py", line 287, in main acc, loss, oacc, avg_iou = train() File "learning/main.py", line 184, in train outputs = model.ecc(embeddings) File "/cvgl2/u/mtchapmi/programs/anaconda3/envs/spgcondaenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, *kwargs) File "/cvgl2/u/mtchapmi/superpoint_graph/learning/graphnet.py", line 97, in forward input = module(input) File "/cvgl2/u/mtchapmi/programs/anaconda3/envs/spgcondaenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(input, **kwargs) File "/cvgl2/u/mtchapmi/superpoint_graph/learning/modules.py", line 54, in forward input = ecc.GraphConvFunction(nc, nc, idxn, idxe, degs, degs_gpu, self._edge_mem_limit)(hx, weights) File "/cvgl2/u/mtchapmi/superpoint_graph/learning/ecc/GraphConvModule.py", line 67, in forward cuda_kernels.conv_aggregate_fw(output.narrow(0,startd,numd), products.view(-1,self._out_channels), self._degs_gpu.narrow(0,startd,numd)) File "/cvgl2/u/mtchapmi/superpoint_graph/learning/ecc/cuda_kernels.py", line 120, in conv_aggregate_fw function, stream = get_kernel_func('conv_aggregate_fw_kernel_v2', conv_aggregate_fw_kernel_v2(), get_dtype(src)) File "/cvgl2/u/mtchapmi/superpoint_graph/learning/ecc/cuda_kernels.py", line 38, in get_kernel_func module.load(bytes(ptx.encode())) File "cupy/cuda/function.pyx", line 181, in cupy.cuda.function.Module.load File "cupy/cuda/function.pyx", line 183, in cupy.cuda.function.Module.load File "cupy/cuda/driver.pyx", line 185, in cupy.cuda.driver.moduleLoadData File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

loicland commented 5 years ago

Hi, do you have the error at the first iteration or are you able to train for some epochs?

I'll page mys007 who is better qualified than me with cuda kernels.

micaeltchapmi commented 5 years ago

It happens at the first epoch

mys007 commented 5 years ago

Thanks for your report, @micaeltchapmi. After some googling I guess this could be related to https://github.com/loicland/superpoint_graph/issues/37#issuecomment-394787840 and https://github.com/NVIDIA/FastPhotoStyle/issues/20 . Could you perhaps try to install cupy==4.5.0 and if that doesn't help, to uninstall cupy and install cupy-cuda90 instead?

micaeltchapmi commented 5 years ago

Hi @mys007, thanks for your reply. I did install cupy4.5.0 as you suggested but I still got the same error. I had already tried with cupy-cuda90 but it didn't work. I'm not sure what went wrong but I tried uninstalling and re-installing the nvidia-390 drivers and then rebooting my system and it seems to work. I'm now able to train without any problems. Hope this solution works for everyone that encounters this error.

sandeepnmenon commented 3 years ago

@loicland I am able to reproduce this error. Creating new issue #250 Getting the error at the beginning of the first epoch while trying to run the resume script for vKITTI

Total number of parameters: 53601
Module(
  (ecc): GraphNetwork(
    (0): RNNGraphConvModule(
      (_cell): GRUCellEx(
        32, 32
        (ini): InstanceNorm1d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (inh): InstanceNorm1d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (ig): Linear(in_features=32, out_features=32, bias=True)
      )(ingate layernorm)
      (_fnet): Sequential(
        (0): Linear(in_features=13, out_features=32, bias=True)
        (1): ReLU(inplace=True)
        (2): Linear(in_features=32, out_features=128, bias=True)
        (3): ReLU(inplace=True)
        (4): Linear(in_features=128, out_features=64, bias=True)
        (5): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (6): ReLU(inplace=True)
        (7): Linear(in_features=64, out_features=32, bias=False)
      )
    )
    (1): Linear(in_features=32, out_features=13, bias=True)
  )
  (ptn): PointNet(
    (stn): STNkD(
      (convs): Sequential(
        (0): Conv1d(9, 32, kernel_size=(1,), stride=(1,))
        (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv1d(32, 64, kernel_size=(1,), stride=(1,))
        (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
      )
      (fcs): Sequential(
        (0): Linear(in_features=64, out_features=32, bias=True)
        (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Linear(in_features=32, out_features=16, bias=True)
        (4): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
      )
      (proj): Linear(in_features=16, out_features=4, bias=True)
    )
    (convs): Sequential(
      (0): Conv1d(9, 64, kernel_size=(1,), stride=(1,))
      (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv1d(64, 64, kernel_size=(1,), stride=(1,))
      (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
      (7): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (fcs): Sequential(
      (0): Linear(in_features=129, out_features=64, bias=True)
      (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Linear(in_features=64, out_features=32, bias=True)
      (4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Linear(in_features=32, out_features=32, bias=True)
    )
  )
)
  0%|                                     | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./learning/main.py", line 459, in <module>
    main()
  File "./learning/main.py", line 381, in main
    acc_test, oacc_test, avg_iou_test, per_class_iou_test, predictions_test, avg_acc_test, confusion_matrix = eval_final()
  File "./learning/main.py", line 287, in eval_final
    outputs = model.ecc(embeddings)
  File "/home/deepenai/anaconda3/envs/superpoint/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/deepenai/SandeepMenon/superpoint/superpoint_graph/learning/../learning/graphnet.py", line 97, in forward
    input = module(input)
  File "/home/deepenai/anaconda3/envs/superpoint/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/deepenai/SandeepMenon/superpoint/superpoint_graph/learning/../learning/modules.py", line 175, in forward
    input = ecc.GraphConvFunction.apply(hx, weights, nc, nc, idxn, idxe, degs, degs_gpu,
  File "/home/deepenai/SandeepMenon/superpoint/superpoint_graph/learning/../learning/ecc/GraphConvModule.py", line 79, in forward
    cuda_kernels.conv_aggregate_fw(output.narrow(0, startd, numd), products.view(-1, ctx._out_channels),
  File "/home/deepenai/SandeepMenon/superpoint/superpoint_graph/learning/../learning/ecc/cuda_kernels.py", line 125, in conv_aggregate_fw
    function, stream = get_kernel_func('conv_aggregate_fw_kernel_v2', conv_aggregate_fw_kernel_v2(), get_dtype(src))
  File "/home/deepenai/SandeepMenon/superpoint/superpoint_graph/learning/../learning/ecc/cuda_kernels.py", line 43, in get_kernel_func
    module.load(bytes(ptx.encode()))
  File "cupy/cuda/function.pyx", line 241, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 243, in cupy.cuda.function.Module.load
  File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

Environment details Pytorch: 1.8.1 cupy: 8.6.0 CUDA: 11.0 Torch geometric: 1.7.0 gcc and g++: 7.5.0-3ubuntu1~18.04

I know the versions are higher that what is mentioned in the readme, but I was able to run the "Learned Partition script" for training in the above environment, using the shell script and was able to run the quality evaluation script for it as well.

rttariverdi67 commented 2 years ago

Hi all, I was facing same error, the way I solved it, I used docker image , reinistall (NVIDIA)Docker itself(make sure that you uninstaled all related stuff), and don't forget to use ''--gpus all'' flag in docker and/or "--nv" in singularity! and inside docker it is good practice to check if nvidia-smi, and nvcc -V are properly set.

loicland / superpoint_graph

cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed #105