Performance issue - Githubissues

quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

BSD 3-Clause "New" or "Revised" License

338 stars 45 forks source link

Hello, when I utilize NPU(HTP aka. cDSP) on Snapdragon 8 gen3, I meet some performance problems. Here is the details. When we use qhblas_hvx_ah_matrix_vector_mpy_ab in Hexagon SDK qhl_hvx library, we find that it is much slower than directly computing with CPU Arm Neon. The result is shown as below. cDSP(NPU) [13008, 5120] [5120, 1] 45ms CPU [13008, 5120] [5120, 1] 10ms After that, we set the power mode to performance mode, the cDSP execution time is a little faster, but still slower than CPU. cDSP(NPU) [13008, 5120] [5120, 1] 36ms CPU [13008, 5120] [5120, 1] 10ms I want to know if the result is correct and is compatible with your tests? Looking forward to get your response.

quic / ai-hub-models

Performance issue #28