zhiyuanyou / UniAD

[NeurIPS 2022 Spotlight] A Unified Model for Multi-class Anomaly Detection
Apache License 2.0
250 stars 28 forks source link

RuntimeError: CUDA error: no kernel image is available for execution on the device #18

Closed hihunjin closed 1 year ago

hihunjin commented 1 year ago
[2023-05-19 15:47:23,443][       utils.py][line: 740][    INFO]  not exist, load from https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b4-6ed6700e.pth
[2023-05-19 15:47:24,572][       utils.py][line: 761][    INFO] Loaded ImageNet pretrained efficientnet-b4
[2023-05-19 15:47:36,139][   train_val.py][line:  90][    INFO] layers: ['backbone', 'neck', 'reconstruction']
[2023-05-19 15:47:36,140][   train_val.py][line:  91][    INFO] active layers: ['reconstruction', 'neck']
=> loading checkpoint './checkpoints/ckpt.pth.tar'
[2023-05-19 15:47:41,654][custom_dataset.py][line:  36][    INFO] building CustomDataset from: ../../data/MVTec-AD/train.json
[2023-05-19 15:47:41,667][custom_dataset.py][line:  36][    INFO] building CustomDataset from: ../../data/MVTec-AD/test.json
Traceback (most recent call last):
  File "../../tools/train_val.py", line 329, in <module>
    main()
  File "../../tools/train_val.py", line 125, in main
    validate(val_loader, model)
  File "../../tools/train_val.py", line 269, in validate
    outputs = model(input)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 445, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/NFS/workspaces/hjha/dev/UniAD/models/model_helper.py", line 49, in forward
    output = submodule(input)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/NFS/workspaces/hjha/dev/UniAD/models/backbones/efficientnet/model.py", line 392, in forward
    x, feat_dict, _ = self.extract_features(image)
  File "/NFS/workspaces/hjha/dev/UniAD/models/backbones/efficientnet/model.py", line 358, in extract_features
    x = self._swish(self._bn0(self._conv_stem(inputs)))
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/NFS/workspaces/hjha/dev/UniAD/models/backbones/efficientnet/utils.py", line 101, in forward
    x = x * torch.sigmoid(x)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/usr/local/bin/python', '-u', '../../tools/train_val.py', '--local_rank=0', '-e']' returned non-zero exit status 1.
zhiyuanyou commented 1 year ago

Please have a try with:

import torch
torch.tensor([1,2,3]).cuda()

See the outputs.

hihunjin commented 1 year ago
$ python
Python 3.8.12 (default, Nov 17 2021, 16:58:51) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.rand(10).cuda()
tensor([0.9758, 0.9910, 0.7123, 0.6851, 0.2408, 0.6267, 0.3766, 0.4412, 0.6818,
        0.7391], device='cuda:0')
zhiyuanyou commented 1 year ago

It is a little wired.

The error means that there may be something wrong with CUDA. However, in the test case, you actually could use CUDA successfully.

How did you run the shell (.sh) scripts?