RuntimeError: cuda runtime error: no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47

veilupt commented 4 years ago

I have created new virtual machine on google cloud platform, i have Installed packages mentioned on README.md

Is Cuda device(transformers package) is mandatory to run fast-bert model otherwise how can i save model without using GPU device,

@kaushaltrivedi Kindly help me to resolve this

Sample code:

device_cuda = torch.device("cuda")
learner = BertLearner.from_pretrained_model(
                        databunch,
                        pretrained_path='bert-base-uncased',
                        metrics=metrics,
                        device=device_cuda,
                        logger=logger,
                        output_dir=OUTPUT_DIR,
                        finetuned_wgts_path=None,
                        warmup_steps=500,
                        multi_gpu=True,
                        is_fp16=True,
                        multi_label=False,
                        logging_steps=50)

Error logs:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "bert/run.py", line 139, in <module>
    multi_label=True, logging_steps=0)
  File "/home/gcp/base/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 163, in from_pretrained_model
    model.to(device)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/nn/modules/module.py", line 443, in to
    return self._apply(convert)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/nn/modules/module.py", line 203, in _apply
    module._apply(fn)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/nn/modules/module.py", line 203, in _apply
    module._apply(fn)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/nn/modules/module.py", line 203, in _apply
    module._apply(fn)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/nn/modules/module.py", line 225, in _apply
    param_applied = fn(param)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/nn/modules/module.py", line 441, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/home/gcp/base/lib/python3.7/site-packages/torch/cuda/__init__.py", line 153, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47‹ collapse

kaushaltrivedi commented 4 years ago

if you dont have GPU, change the device setting to cpu

device_cpu = torch.device("cpu")

veilupt commented 4 years ago

@kaushaltrivedi after changing the line while running multi label classification, getting the following error,

Traceback (most recent call last):
  File "bert/run.py", line 142, in <module>
    learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 351, in fit
    self.model, optimizer, opt_level=self.fp16_opt_level
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/apex/amp/frontend.py", line 358, in initialize
    return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/apex/amp/_initialize.py", line 171, in _initialize
    check_params_fp32(models)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/apex/amp/_initialize.py", line 93, in check_params_fp32
    name, param.type()))
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/apex/amp/_amp_state.py", line 32, in warn_or_err
    raise RuntimeError(msg)
RuntimeError: Found param bert.embeddings.word_embeddings.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

veilupt commented 4 years ago

@kaushaltrivedi Yes fixed by,

model = TransformerModel('bert', 'bert-base-cased', use_cuda=False, args={'fp16': False})

https://github.com/ThilinaRajapakse/simpletransformers/issues/32#issuecomment-551051328

utterworks / fast-bert

RuntimeError: cuda runtime error: no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47 #232