microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.05k stars 4.06k forks source link

new_grad_tensor.copy_(param.grad.view(-1)) AttributeError: 'NoneType' object has no attribute 'view' #700

Open ghost opened 3 years ago

ghost commented 3 years ago

I'm trying to apply deepspeed stage 2 to stylegan2 but I get this error.

Here's my config:

{
  "train_batch_size" : 4,
  "optimizer": {
    "type": "Adam",
    "params": {
      "lr": 0.0002,
      "betas": [
        0.5,
        0.999
      ],
      "eps": 1e-8
    }
  },
  "steps_per_print" : 10,
  "fp16": {
      "enabled": true
  },
  "zero_optimization": {
      "stage": 2,
      "cpu_offload": true,
      "contiguous_gradients": true,
      "overlap_comm": true
  }
}

And here's the full stack trace:

Traceback (most recent call last):
  File "stylegan2_pytorch/ucl_deepspeed.py", line 200, in <module>
    main()
  File "stylegan2_pytorch/ucl_deepspeed.py", line 197, in main
    train_from_folder(deepspeed_args=deepspeed_args)
  File "stylegan2_pytorch/ucl_deepspeed.py", line 177, in train_from_folder
    run_training(0, 1, model_args, data, load_from, new, num_train_steps, name, seed)
  File "stylegan2_pytorch/ucl_deepspeed.py", line 62, in run_training
    retry_call(model.train, tries=3, exceptions=NanException)
  File "/opt/conda/lib/python3.7/site-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/opt/conda/lib/python3.7/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/home/dtkatch/stylegan2-pytorch/stylegan2_pytorch/stylegan2_pytorch.py", line 1052, in train
    self.GAN.model_engineG.backward(gen_loss)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/engine.py", line 845, in backward
    self.optimizer.backward(loss)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 1609, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 53, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 594, in reduce_partition_and_remove_grads
    self.reduce_ready_partitions_and_remove_grads(param, i)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 984, in reduce_ready_partitions_and_remove_grads
    self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 637, in reduce_independent_p_g_buckets_and_remove_grads
    new_grad_tensor.copy_(param.grad.view(-1))
AttributeError: 'NoneType' object has no attribute 'view'
eltonzheng commented 3 years ago

Thanks @dtkatch for the reporting, could you provide the repro steps, which is helpful for us to investigate?

HUAFOR commented 10 months ago

改用stage1,不要用stage2,就可以解决

miguelscarv commented 9 months ago

Hello @eltonzheng . I am still facing this issue. This is the traceback I'm getting:

Traceback (most recent call last):
  File "/cfs/home/u021543/pheye_llavar_accelerate.py", line 68, in <module>
    accelerator.backward(loss)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/accelerate/accelerator.py", line 1958, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1955, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2019, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 865, in reduce_partition_and_remove_grads
    self.reduce_ready_partitions_and_remove_grads(param, i)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1377, in reduce_ready_partitions_and_remove_grads
    self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/cfs/home/u021543/miniconda3/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 911, in reduce_independent_p_g_buckets_and_remove_grads
    new_grad_tensor.copy_(grad_reduc.view(-1))
                          ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'view'

Like @HUAFOR said, changing to Stage 1 solves it, but I really need Stage 2.

Providing a code example isn't the easiest thing in my case, but I can try to describe what I am doing - I added different sets of LoRA adapters to a model (3 to be exact). This model processes images, so for each example what I am doing is using the same model with different LoRA adapters for the same image at different sizes. This makes it so that the model that processes images at higher resolutions has to use more forward passes, so the backward pass is much more expensive since it has to record multiple gradients.

TLDR: I am doing something that needs to store a lot of gradients for each example, that is why I wanted to use zero stage 2.