neuralmagic / AutoFP8

Apache License 2.0
150 stars 17 forks source link

LLaMA3 report #37

Closed Eric-mingjie closed 1 month ago

Eric-mingjie commented 1 month ago

Hi, thanks for the great work.

I noticed that you have fp8 checkpoints for llama3.1-405b. However, in the technical report of LLaMA-3 section 6.2, their team mentioned that they had some work-arounds for effectively quantizing the largest 405b model due to some difficulties. For example, they opt to skip the attention layers; also they skip the first and last Transformer block.

I wonder if you have seen similar issues when quantizing llama3.1-405b.