mgrankin / ru_transformers

Apache License 2.0
776 stars 108 forks source link

Colab notebook GPU issue #40

Closed airogachev closed 3 years ago

airogachev commented 3 years ago

Running the notebook, provided here, I've got an error below:

Traceback (most recent call last):
  File "run_lm_finetuning.py", line 662, in <module>
    main()
  File "run_lm_finetuning.py", line 630, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_lm_finetuning.py", line 296, in train
    model, optimizer = amp.initialize(model.to('cuda'), optimizer, opt_level=args.fp16_opt_level)
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/frontend.py", line 358, in initialize
    return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py", line 171, in _initialize
    check_params_fp32(models)
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py", line 93, in check_params_fp32
    name, param.type()))
  File "/usr/local/lib/python3.6/dist-packages/apex/amp/_amp_state.py", line 32, in warn_or_err
    raise RuntimeError(msg)
RuntimeError: Found param transformer.wte.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

Even fixing line to: device = torch.device("cuda")

I've got another issue:


10/21/2020 08:22:51 - INFO - transformers.modeling_utils -   loading weights file gpt2/m_checkpoint-3364613/pytorch_model.bin
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "run_lm_finetuning.py", line 662, in <module>
    main()
  File "run_lm_finetuning.py", line 596, in main
    model.to(args.device)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 605, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 190, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47

Definitely, I run GPU env. Am I doing something wrong?

airogachev commented 3 years ago

Looks like this is the fix:

%set_env CUDA_VISIBLE_DEVICES=0