quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
338 stars 45 forks source link

Performance issue #28

Closed YixinSong-e closed 3 months ago

YixinSong-e commented 3 months ago

Hello, when I utilize NPU(HTP aka. cDSP) on Snapdragon 8 gen3, I meet some performance problems. Here is the details. When we use qhblas_hvx_ah_matrix_vector_mpy_ab in Hexagon SDK qhl_hvx library, we find that it is much slower than directly computing with CPU Arm Neon. The result is shown as below. cDSP(NPU) [13008, 5120] [5120, 1] 45ms CPU [13008, 5120] [5120, 1] 10ms After that, we set the power mode to performance mode, the cDSP execution time is a little faster, but still slower than CPU. cDSP(NPU) [13008, 5120] [5120, 1] 36ms CPU [13008, 5120] [5120, 1] 10ms I want to know if the result is correct and is compatible with your tests? Looking forward to get your response.

mestrona-3 commented 3 months ago

After chatting on Slack, Yixin mentioned that they are using the QNN SDK to run their model and hit this issue but have since resolved it. Closing it as there is no action here.