Open shifeiwen opened 1 month ago
Hi @shifeiwen that's a great suggestion. We currently store user provided data (inference job's input dataset) and simply serialize it to numpy tensor to use along with qnn-net-run.
You can serialize input as follow
You can refer to https://github.com/quic/ai-hub-models/blob/bb0ca2e36ada4f6831c6d77f5a27a5f21c0efc28/qai_hub_models/models/_shared/llama/app.py#L267 to get an understanding on how to convert input_prompt
into tensors for input of a first model.
By running each model, you can quickly create inputs for sub-sequent model parts.
Let us know if this unblocks you running these models via qnn-net-run
or have any follow up questions
Is your feature request related to a problem? Please describe. I am trying to run the llama2 demo. Through export.py, I get several .bin files of HTP. Can you provide the test input files of this model together so that I can run this model on my device using qnn-net-run. Test locally. I think there should be such files on the cloud device. Can I download them to the local computer? Thank you.
Describe the solution you'd like export.py will export the compiled test input file