Open ferdiko opened 4 months ago
Might be related to #6088
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
🐛 Describe the bug
I'm running the
examples/offline_inference.py
script with Mixtral 8x7b and FP8 quantization (tp=2), i.e., the following script:I'm on main on the latest commit as of now (47f0954af0a5aefd0db19875f6bdcbe933d055a9).
I get the following error. I only get it if I enable FP8 quantization (otherwise, the script runs fine).
If I run Llama3-8B, the script runs fine even with FP8 quantization. However, I still see the following warning: