mengziyi64 / TSA-Net

End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention
68 stars 18 forks source link

RuntimeError: CUDA error: invalid configuration argument #7

Closed bryanbocao closed 1 year ago

bryanbocao commented 1 year ago
TSA-Net/TSA_pytorch# python3 test.py 
Traceback (most recent call last):
  File "test.py", line 67, in <module>
    main()    
  File "test.py", line 61, in main
    (pred, truth, psnr_all, ssim_all, psnr_mean, ssim_mean) = test(last_train)
  File "test.py", line 37, in test
    model_out = model(test_PhiTy)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/models.py", line 38, in forward
    enc1,enc1_pre = self.tconv_down1(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/models.py", line 77, in forward
    feat_pool = self.pool(feat) if self.pool is not None else feat            
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 162, in forward
    return F.max_pool2d(input, self.kernel_size, self.stride,
  File "/opt/conda/lib/python3.8/site-packages/torch/_jit_internal.py", line 396, in fn
    return if_false(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 699, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA error: invalid configuration argument
python -c 'import torch; print(torch.__version__)'
1.9.0a0+2ecb2c7
nvidia-smi
Sun Mar 19 20:04:28 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   44C    P8    22W / 350W |     15MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
| 30%   47C    P8    33W / 350W |   2640MiB / 24234MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1228      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2447      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2763      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      1228      G   /usr/lib/xorg/Xorg                102MiB |
|    1   N/A  N/A      2447      G   /usr/lib/xorg/Xorg                102MiB |
|    1   N/A  N/A      2763      G   /usr/lib/xorg/Xorg               1310MiB |
|    1   N/A  N/A      2896      G   /usr/bin/gnome-shell              124MiB |
|    1   N/A  N/A      3825      G   ...198883770977238872,131072      628MiB |
|    1   N/A  N/A     11711      G   ...RendererForSitePerProcess      141MiB |
+-----------------------------------------------------------------------------+

GPU: NVIDIA GeForce RTX 3090.

Any help would be appreciated. Thanks!

bryanbocao commented 1 year ago
pip install torch==1.2.0 torchvision==0.4.0
TSA-Net/TSA_pytorch# python3 test.py
Traceback (most recent call last):
  File "/usr/lib/python3.6/tarfile.py", line 189, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'ct\nq\x05)Rq'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/tarfile.py", line 2299, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.6/tarfile.py", line 1093, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/usr/lib/python3.6/tarfile.py", line 1035, in frombuf
    chksum = nti(buf[148:156])
  File "/usr/lib/python3.6/tarfile.py", line 191, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 555, in _load
    return legacy_load(f)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 466, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/usr/lib/python3.6/tarfile.py", line 1591, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1621, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1484, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python3.6/tarfile.py", line 2311, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 29, in <module>
    model = torch.load('./model/' + model_save_filename + '/model_epoch_{}.pth'.format(last_train))    
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 559, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: ./model/model/model_epoch_80.pth is a zip archive (did you mean to use torch.jit.load()?)
bryanbocao commented 1 year ago
pip install torch==1.4.0 torchvision==0.5.0
TSA-Net/TSA_pytorch# python3 test.py 
Traceback (most recent call last):
  File "test.py", line 29, in <module>
    model = torch.load('./model/' + model_save_filename + '/model_epoch_{}.pth'.format(last_train))    
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 527, in load
    with _open_zipfile_reader(f) as opened_zipfile:
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 224, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f20c5de8193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f20375529eb in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f2037553c04 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x6c53a6 (0x7f20df2543a6 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x2961c4 (0x7f20dee251c4 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #6: python3() [0x594a71]
frame #7: python3() [0x54a035]
frame #8: python3() [0x5515c1]
frame #10: python3() [0x50a433]
frame #12: python3() [0x507be4]
frame #14: python3() [0x594a01]
frame #15: python3() [0x549e8f]
frame #16: python3() [0x5515c1]
frame #18: python3() [0x50a433]
frame #20: python3() [0x507be4]
frame #21: python3() [0x509900]
frame #22: python3() [0x50a2fd]
frame #24: python3() [0x507be4]
frame #26: python3() [0x634e72]
frame #31: __libc_start_main + 0xe7 (0x7f20e3a37b97 in /lib/x86_64-linux-gnu/libc.so.6)
bryanbocao commented 1 year ago
pip install torch==1.5.0 torchvision==0.6.0
TSA-Net/TSA_pytorch# python3 test.py 
loaded!
Traceback (most recent call last):
  File "test.py", line 68, in <module>
    main()    
  File "test.py", line 62, in main
    (pred, truth, psnr_all, ssim_all, psnr_mean, ssim_mean) = test(last_train)
  File "test.py", line 38, in test
    model_out = model(test_PhiTy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/models.py", line 38, in forward
    enc1,enc1_pre = self.tconv_down1(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/models.py", line 77, in forward
    feat_pool = self.pool(feat) if self.pool is not None else feat            
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/pooling.py", line 141, in forward
    self.return_indices)
  File "/usr/local/lib/python3.6/dist-packages/torch/_jit_internal.py", line 209, in fn
    return if_false(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 539, in _max_pool2d
    input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: non-empty 3D or 4D input tensor expected but got ndim: 4
bryanbocao commented 1 year ago

test.py

# test_path = "./Data/Kaist_test/" 
test_path = '../TSA_Net_simulation/Data/Testing_data/'
TSA-Net/TSA_pytorch# python3 test.py 

 np.shape(mask3d_batch):  torch.Size([32, 28, 256, 256])
0 (256, 256, 28) 0.91325766 0.004317734
1 (256, 256, 28) 0.6268645 0.0024493488
2 (256, 256, 28) 0.7478274 0.0038342373
3 (256, 256, 28) 0.8852826 0.0023525485
4 (256, 256, 28) 0.90493804 0.0034019176
5 (256, 256, 28) 1.0582489 0.000809521
6 (256, 256, 28) 0.5335211 0.008445868
7 (256, 256, 28) 1.0526695 0.0
8 (256, 256, 28) 0.8439114 0.0029929152
9 (256, 256, 28) 0.93364686 0.0029743377
loaded!

 np.shape(test_PhiTy):  torch.Size([10, 28, 256, 256])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 64, 256, 256])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 128, 128, 128])
Traceback (most recent call last):
  File "test.py", line 73, in <module>
    main()    
  File "test.py", line 67, in main
    (pred, truth, psnr_all, ssim_all, psnr_mean, ssim_mean) = test(last_train)
  File "test.py", line 43, in test
    model_out = model(test_PhiTy)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/models.py", line 40, in forward
    enc3,enc3_pre = self.tconv_down3(enc2)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/models.py", line 75, in forward
    feat = self.layer2(feat)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch/architecture/ResidualFeat.py", line 33, in forward
    out = self.bn_init(out)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 106, in forward
    exponential_average_factor, self.eps)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1923, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Have 20+GB GPU memory available:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 41%   36C    P8    15W / 260W |     18MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
| 30%   51C    P2   108W / 350W |   2028MiB / 24234MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1163      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2745      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      3086      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      1163      G   /usr/lib/xorg/Xorg                102MiB |
|    1   N/A  N/A      2745      G   /usr/lib/xorg/Xorg                102MiB |
|    1   N/A  N/A      3086      G   /usr/lib/xorg/Xorg                756MiB |
|    1   N/A  N/A      3219      G   /usr/bin/gnome-shell              112MiB |
|    1   N/A  N/A      3762      G   ...037349194217348053,131072      308MiB |
|    1   N/A  N/A      5949      G   ...RendererForSitePerProcess      120MiB |
|    1   N/A  N/A     72547      C   python3                           287MiB |
+-----------------------------------------------------------------------------+
bryanbocao commented 1 year ago
pip install torch==1.6.0 torchvision==0.7.0
TSA-Net/TSA_pytorch# python3 test.py 
/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py:125: UserWarning: 
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

 np.shape(mask3d_batch):  torch.Size([32, 28, 256, 256])
0 (256, 256, 28) 0.91325766 0.004317734
1 (256, 256, 28) 0.6268645 0.0024493488
2 (256, 256, 28) 0.7478274 0.0038342373
3 (256, 256, 28) 0.8852826 0.0023525485
4 (256, 256, 28) 0.90493804 0.0034019176
5 (256, 256, 28) 1.0582489 0.000809521
6 (256, 256, 28) 0.5335211 0.008445868
7 (256, 256, 28) 1.0526695 0.0
8 (256, 256, 28) 0.8439114 0.0029929152
9 (256, 256, 28) 0.93364686 0.0029743377
loaded!
Traceback (most recent call last):
  File "test.py", line 73, in <module>
    main()    
  File "test.py", line 67, in main
    (pred, truth, psnr_all, ssim_all, psnr_mean, ssim_mean) = test(last_train)
  File "test.py", line 37, in test
    test_gt = test_data.cuda().float()
RuntimeError: CUDA error: no kernel image is available for execution on the device
root@ca9b613d2bf4:/share/home/brcao/Repos/fork/TSA-Net/TSA_pytorch# python3 test.py 
/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py:125: UserWarning: 
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

 np.shape(mask3d_batch):  torch.Size([32, 28, 256, 256])
0 (256, 256, 28) 0.91325766 0.004317734
1 (256, 256, 28) 0.6268645 0.0024493488
2 (256, 256, 28) 0.7478274 0.0038342373
3 (256, 256, 28) 0.8852826 0.0023525485
4 (256, 256, 28) 0.90493804 0.0034019176
5 (256, 256, 28) 1.0582489 0.000809521
6 (256, 256, 28) 0.5335211 0.008445868
7 (256, 256, 28) 1.0526695 0.0
8 (256, 256, 28) 0.8439114 0.0029929152
9 (256, 256, 28) 0.93364686 0.0029743377
loaded!
Traceback (most recent call last):
  File "test.py", line 73, in <module>
    main()    
  File "test.py", line 67, in main
    (pred, truth, psnr_all, ssim_all, psnr_mean, ssim_mean) = test(last_train)
  File "test.py", line 37, in test
    test_gt = test_data.cuda().float()
RuntimeError: CUDA error: no kernel image is available for execution on the device
bryanbocao commented 1 year ago

Solved:

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
TSA-Net/TSA_pytorch# python3 test.py 

 np.shape(mask3d_batch):  torch.Size([32, 28, 256, 256])
0 (256, 256, 28) 0.91325766 0.004317734
1 (256, 256, 28) 0.6268645 0.0024493488
2 (256, 256, 28) 0.7478274 0.0038342373
3 (256, 256, 28) 0.8852826 0.0023525485
4 (256, 256, 28) 0.90493804 0.0034019176
5 (256, 256, 28) 1.0582489 0.000809521
6 (256, 256, 28) 0.5335211 0.008445868
7 (256, 256, 28) 1.0526695 0.0
8 (256, 256, 28) 0.8439114 0.0029929152
9 (256, 256, 28) 0.93364686 0.0029743377
loaded!

 np.shape(test_PhiTy):  torch.Size([10, 28, 256, 256])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 64, 256, 256])
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 128, 128, 128])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 256, 64, 64])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 512, 32, 32])
===> Epoch 80: testing psnr = 30.24, ssim = 0.898, time: 0.08
dw0127 commented 8 months ago

Solved:

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
TSA-Net/TSA_pytorch# python3 test.py 

 np.shape(mask3d_batch):  torch.Size([32, 28, 256, 256])
0 (256, 256, 28) 0.91325766 0.004317734
1 (256, 256, 28) 0.6268645 0.0024493488
2 (256, 256, 28) 0.7478274 0.0038342373
3 (256, 256, 28) 0.8852826 0.0023525485
4 (256, 256, 28) 0.90493804 0.0034019176
5 (256, 256, 28) 1.0582489 0.000809521
6 (256, 256, 28) 0.5335211 0.008445868
7 (256, 256, 28) 1.0526695 0.0
8 (256, 256, 28) 0.8439114 0.0029929152
9 (256, 256, 28) 0.93364686 0.0029743377
loaded!

 np.shape(test_PhiTy):  torch.Size([10, 28, 256, 256])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 64, 256, 256])
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 128, 128, 128])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 256, 64, 64])

 models.py - Encoder_Triblock - np.shape(feat):  torch.Size([10, 512, 32, 32])
===> Epoch 80: testing psnr = 30.24, ssim = 0.898, time: 0.08

great!