Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
In https://www.qualcomm.com/developer/blog/2024/01/qualcomm-cloud-ai-100-accelerates-large-language-model-inference-2x-using-microscaling-mx the AI 100 got significant results in LLM inference. Now I'm using AI 100 on AWS dl2q learning the
-mxfp6-matmul
given byqaic-run --exec
, another way to learn mxfp6 is code in https://github.com/microsoft/microxcaling. But I found some inconsistencies between them, as shown below:Microscaling
Outputs:
0.0156, 0.0312, 0.0625, 0.1250
were clamp to 0AI 100
After onnx model generated, quant it to qaic-model with
Finally, run qaic model with py API:
Output:
And the results is: 9.5, which is different from expected 9.75 (0.25+0.5+1+8)
Question
Are the implementations of qaic-mxfp6 and microscaling not completely consistent?