Closed ianporada closed 1 month ago
Or it looks like this line was just left over from the previous template. In that case I'm wondering if torch_dtype="auto"
should be used rather than torch_dtype=torch.bfloat16
when loading "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8" since certain weights are F8_E4M3.
Hey @ianporada, thanks for the report. The line is indeed a left over from a previous template. Since the model is already quantized, you don't need to specify the quantization_config
. Would you like to submit a PR to fix the notebook ? Thanks !
Sure! Created a pull request: https://github.com/huggingface/huggingface-llama-recipes/pull/40
I'm also curious why some of the 405B-FP8 weights are FP32, larger than the original model, but that's a separate question so I've aksed in the forum: https://discuss.huggingface.co/t/why-are-some-weights-fp32-in-llama-3-1-405b-fbgemm-fp8-quantiziation/108922
Thanks !
huggingface-llama-recipes/fp8-405B.ipynb says
But no
quantization_config
is passed. Maybe the model was intended to be loaded with aquantization_config
such as:@SunMarc