Bloom176B RuntimeError: expected scalar type Half but found BFloat16

wohenniubi commented 1 year ago

Run cmd & Error:

Using nvidia-py docker 23.04 on A100 and run the Bloom176B cmd deepspeed --num_gpus 8 --module inference_server.benchmark --model_name bigscience/bloom --model_class AutoModelForCausalLM --dtype fp16 --deployment_framework ds_inference --benchmark_cycles 5 --batch_size 1

It will lead to the following error, while Bloom-7b1 model has no such issue.

  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2521, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Half but found BFloat16

The complete stack trace is attached here. Please kindly take a look and welcome any prompt suggestionss Bloom176B_error_BFloat16.txt

Further info:

Bloom176B uses to work on the same A100 and the same docker. No change in the driver version. The only change I made is to delete the checkpoint to save disk and then re-download it again.
using the similar run cmd, Bloom7B1 works well, and no such kind of scalar type Half but found BFloat16 error

Detailed logs:

Bloom176B Checkpoint location checkpoint is downloaded by

export TRANSFORMERS_CACHE=/scratch2/pytorch/bloom/gua_tmp
ls /scratch2/pytorch/bloom/gua_tmp/
models--bigscience--bloom  version.txt
ls /scratch2/pytorch/bloom/gua_tmp/models--bigscience--bloom/
blobs  refs  snapshots

Here are the dependent modules.

pip install transformers==4.26.1 \
deepspeed==0.9.1 \
accelerate==0.16.0 \
gunicorn==20.1.0 \
flask \
flask_api \
fastapi==0.89.1 \
uvicorn==0.19.0 \
jinja2==3.1.2 \
pydantic==1.10.2 \
huggingface_hub==0.12.1 \
grpcio-tools==1.50.0 \
protobuf==3.20.3 \
--no-cache-dir

Nvidia-smi info totally 8 Cards, and I only paste 2 cards info

Sun May 14 12:39:37 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:27:00.0 Off |                    0 |
| N/A   28C    P0    67W / 400W |      4MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  On   | 00000000:2A:00.0 Off |                    0 |
| N/A   26C    P0    65W / 400W |      4MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |

nvcc --version info

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

lyy1994 commented 1 year ago

Got the same issue.

lyy1994 commented 1 year ago

It looks like that this is the problem of the newer version of Deepspeed. Everything works just fine after I downgrade deepspeed==0.7.3.

wohenniubi commented 1 year ago

It looks like that this is the problem of the newer version of Deepspeed. Everything works just fine after I downgrade deepspeed==0.7.3.

Many thanks for the help and sorry for the late response. Just test your suggestion and yes, the Bloom176B could work after downgrade deepspeed to 0.7.3. Thus could this issue

huggingface / transformers-bloom-inference