Open Neo9061 opened 4 weeks ago
I believe it doesn't matter. Even if you didn't specify torch_dtype it would default to the config.json value which is also "torch_dtype": "bfloat16"
.
Keep in mind not all weights are FP8, some weights of the quantized model are still BF16.
Based on this notebook: https://github.com/huggingface/huggingface-llama-recipes/blob/main/local_inference/fp8-405B.ipynb
since we are loading FP8, will that matter if we specify data_type to be
torch.bfloat16
?CC @ianporada who made recent edits in this notebook