microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.15k stars 4.07k forks source link

[BUG] my autocast is not working #4908

Open YooSungHyun opened 9 months ago

YooSungHyun commented 9 months ago

Describe the bug A clear and concise description of what the bug is. i'm working on https://github.com/YooSungHyun/pytorch-trainer ds_train.py

when i forward deepspeed config fp16, model weight is fp16 but input data is fp32 i know that autocast is make this well, but raised on error like this

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1836, in forward
    loss = self.module(*inputs, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/networks/models.py", line 13, in forward
    hidden, _ = self.lstm1(inputs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 879, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: Input and parameter tensors are not the same dtype, found input tensor with Float and parameter tensor with Half

What did I do wrong?

To Reproduce Steps to reproduce the behavior:

  1. run scripts/run_train_deepspeed.sh
  2. raised error

Expected behavior A clear and concise description of what you expected to happen. forward well

ds_report output Please run ds_report to give us details about your setup.

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else?

Docker context Are you using a specific docker image that you can share?

Additional context my zero1 config like this...

{
    "fp16": {
        "enabled": true,
        "auto_cast": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "zero_optimization": {
        "stage": 1,
        "allgather_partitions": true,
        "allgather_bucket_size": 5e7,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 5e7,
        "contiguous_gradients": true
    },
    "train_micro_batch_size_per_gpu": "auto",
    "gradient_accumulation_steps": "auto",
    "wall_clock_breakdown": false
}
YooSungHyun commented 9 months ago

maybe, that option is working with deepspeed.initialize(training_data=...) only...?? i am not initialize with deepspeed... i'm using torch.utils.data.Dataset and torch's dataloader, not deepspeed wrapper

YooSungHyun commented 9 months ago

i given argument to model like model(**batch), but, deepspeed auto_cast is only working *args. image

Replacing it with model(batch["inputs"]) worked for me, but I got an error in backward(). I'm also using torch optimizer for the optimizer.

Found dtype Float but expected Half
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2019, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1958, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/data/bart/temp_workspace/pytorch-trainer/.venv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/data/bart/temp_workspace/pytorch-trainer/ds_train.py", line 115, in training_step
    model.backward(loss)
  File "/data/bart/temp_workspace/pytorch-trainer/trainer/deepspeed.py", line 233, in train_loop
    loss = self.training_step(model=model, batch=batch, batch_idx=batch_idx)
  File "/data/bart/temp_workspace/pytorch-trainer/trainer/deepspeed.py", line 155, in fit
    self.train_loop(
  File "/data/bart/temp_workspace/pytorch-trainer/ds_train.py", line 583, in main
    trainer.fit(
  File "/data/bart/temp_workspace/pytorch-trainer/ds_train.py", line 606, in <module>
    main(args)
RuntimeError: Found dtype Float but expected Half

For auto_cast, I'm using torch.cuda.amp, which I'm sure will work, but will that cause any problems when utilizing offload etc?

with autocast(enabled=True, dtype=torch.float16):
    labels = batch.pop("labels")
    output = model(batch["inputs"])
    loss = self.criterion(output, labels)
npuichigo commented 5 months ago

same issue here. Don't know if torch.autocast can be used together with deepspeed fp16