saic-fi / MobileQuant

[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
Other
40 stars 4 forks source link

PPL test result #1

Closed shifeiwen closed 4 weeks ago

shifeiwen commented 2 months ago

This is a very valuable project for research. I tried to download it. The demo cpp llama1.1b w4a8 can run at 18 t/s. Although the output of the model is not what I asked, it seems that there is no problem with the overall syntax. I would like to ask,

  1. Is there a more detailed test on PPL here? The gemma2b fp32 mmlu is 51.3, but the result you showed is 25.8, which seems to be a lot of points off
  2. I can understand that the quantization method based on W4A8 and W8A8 is implemented through your mobilequant, then saved through aimet, and then used QNN inference. Is this correct? I think this project is a good start. Thanks to the project team for their contribution.
fwtan commented 2 months ago

Thank you for your interest in our work!

fwtan commented 4 weeks ago

feel free to reopen the issue if there are any further questions.