quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
409 stars 56 forks source link

[Feature Request] Provides file input for local testing #71

Open shifeiwen opened 1 month ago

shifeiwen commented 1 month ago

Is your feature request related to a problem? Please describe. I am trying to run the llama2 demo. Through export.py, I get several .bin files of HTP. Can you provide the test input files of this model together so that I can run this model on my device using qnn-net-run. Test locally. I think there should be such files on the cloud device. Can I download them to the local computer? Thank you.

Describe the solution you'd like export.py will export the compiled test input file

bhushan23 commented 1 month ago

Hi @shifeiwen that's a great suggestion. We currently store user provided data (inference job's input dataset) and simply serialize it to numpy tensor to use along with qnn-net-run.

You can serialize input as follow

  1. For each value in input data serialize as numpy file
  2. Create input_list.txt file with key and local serialized data relative file path

You can refer to https://github.com/quic/ai-hub-models/blob/bb0ca2e36ada4f6831c6d77f5a27a5f21c0efc28/qai_hub_models/models/_shared/llama/app.py#L267 to get an understanding on how to convert input_prompt into tensors for input of a first model. By running each model, you can quickly create inputs for sub-sequent model parts.

Let us know if this unblocks you running these models via qnn-net-run or have any follow up questions