huggingface / transformers-bloom-inference

Fast Inference Solutions for BLOOM
Apache License 2.0
561 stars 114 forks source link

question regarding the float16 and bfloat #87

Closed allanj closed 1 year ago

allanj commented 1 year ago

https://github.com/huggingface/transformers-bloom-inference/blob/7bea3526d8270b4aeeefecc57d7d7d638e2bbe0e/bloom-inference-scripts/bloom-ds-inference.py#L121-L137

In this code the first argument in with deepspeed.OnDevice(dtype=dtype, device="meta"): we use float16, while

model = AutoModelForCausalLM.from_config(config, torch_dtype=torch.bfloat16)

here we use bfloat16.

I wonder why we should have this inconsistency?

mayank31398 commented 1 year ago

@allanj it doesn't really matter. when specifying meta, it only does memory allocation with empty tensors and both are 16-bit.