huggingface / transformers-bloom-inference

Fast Inference Solutions for BLOOM
Apache License 2.0
560 stars 114 forks source link

Bloom176B RuntimeError: expected scalar type Half but found BFloat16 #89

Closed wohenniubi closed 1 year ago

wohenniubi commented 1 year ago

Run cmd & Error:

Using nvidia-py docker 23.04 on A100 and run the Bloom176B cmd deepspeed --num_gpus 8 --module inference_server.benchmark --model_name bigscience/bloom --model_class AutoModelForCausalLM --dtype fp16 --deployment_framework ds_inference --benchmark_cycles 5 --batch_size 1

It will lead to the following error, while Bloom-7b1 model has no such issue.

  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2521, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Half but found BFloat16

The complete stack trace is attached here. Please kindly take a look and welcome any prompt suggestionss Bloom176B_error_BFloat16.txt image

Further info:

  1. Bloom176B uses to work on the same A100 and the same docker. No change in the driver version. The only change I made is to delete the checkpoint to save disk and then re-download it again.
  2. using the similar run cmd, Bloom7B1 works well, and no such kind of scalar type Half but found BFloat16 error

Detailed logs:

lyy1994 commented 1 year ago

Got the same issue.

lyy1994 commented 1 year ago

It looks like that this is the problem of the newer version of Deepspeed. Everything works just fine after I downgrade deepspeed==0.7.3.

wohenniubi commented 1 year ago

It looks like that this is the problem of the newer version of Deepspeed. Everything works just fine after I downgrade deepspeed==0.7.3.

Many thanks for the help and sorry for the late response. Just test your suggestion and yes, the Bloom176B could work after downgrade deepspeed to 0.7.3. Thus could this issue