Closed taeyeonlee closed 2 months ago
@taeyeonlee Hi there, sorry to bother you. I have a question that's not entirely related to this issue. When I was running the export for LLaMA, I encountered a problem with uploading to QAI-Hub, resulting in a compilation failure. Have you experienced this issue?
In the compile job results I got Failed to load the encodings file from the uploaded .aimet directory. Please verify that it is a properly formatted .json file.
it looks that there is a pipline workflow in the ai-hub, i have the same question, how to run these 8 files on my s24
We know we do not have a good guide for what to do with these models and the integration part can be very challenging. We are actively working to improve this part of the story. Stay tuned.
The high-level overview we can give you until then is this:
The prompt processor should be split into 4 parts. The token generator is split into 4 parts. Each part should be <2 GB. Load the four prompt processor parts. Execute them one by one. At this point you can unload these parts. Load the four token generator parts. Keep them all loaded. Execute them one by one to generate one token. Continue until stopping criteria.
For faster response times we strongly recommend submitting any questions in our Slack Community.
We know we do not have a good guide for what to do with these models and the integration part can be very challenging. We are actively working to improve this part of the story. Stay tuned.
The high-level overview we can give you until then is this:
The prompt processor should be split into 4 parts. The token generator is split into 4 parts. Each part should be <2 GB. Load the four prompt processor parts. Execute them one by one. At this point you can unload these parts. Load the four token generator parts. Keep them all loaded. Execute them one by one to generate one token. Continue until stopping criteria.
"By the way, will AI-hub provide a method for power consumption measurement in the future? I'm a hobbyist developer, and I'm very concerned about the power consumption of my app when it's running on the SM8650. thank you!"
We know we do not have a good guide for what to do with these models and the integration part can be very challenging. We are actively working to improve this part of the story. Stay tuned.
The high-level overview we can give you until then is this:
The prompt processor should be split into 4 parts. The token generator is split into 4 parts. Each part should be <2 GB. Load the four prompt processor parts. Execute them one by one. At this point you can unload these parts. Load the four token generator parts. Keep them all loaded. Execute them one by one to generate one token. Continue until stopping criteria.
in qcom ai stack,i found a tutorial for llama2 which version=0.1.0.240612, can i deploy these llama2 split bins with this tutorial steps run the model in a llama pipeline and skip the step1 step2? thanks for your help
As @mestrona-3 mentioned, we are actively working on a tutorial to share with community to help run these models efficiently on device. meanwhile,
@AndreaChiChengdu you can use tutorial being referred to get an understanding and run these models similarly. You will have to consider model i/o names to make sure assets are configured correctly to run. Please give it a try and let us know how it goes.
@taeyeonlee Hi there, sorry to bother you. I have a question that's not entirely related to this issue. When I was running the export for LLaMA, I encountered a problem with uploading to QAI-Hub, resulting in a compilation failure. Have you experienced this issue? In the compile job results I got
Failed to load the encodings file from the uploaded .aimet directory. Please verify that it is a properly formatted .json file.
Hi @MaTwickenham there could be problem in model being uploaded during your run. Could you please give it another try? and also share hub job link as a reference?
Thanks @bhushan23! All, I'd like to close this GitHub issue as the original question has been resolved. For any follow up questions, please post your question and AI Hub job link in our Slack Community. Thanks!
@bhushan23 I will try it later. And the compile job id is jvgdzm8e5
Hi @MaTwickenham I see Llama2_PromptProcessor_1_Quantized.encodings is corrupted and is not full encoding that it is supposed to be
Could you please check .encodings
file locally downloaded by our scripts? it's usually in ~/.qaihm
directory
Dear Qualcomm, According to the Guide for NN Model (inception_v3) (file:///C:/Qualcomm/AIStack/QAIRT/2.24.0.240626/docs/QNN/general/tutorial2.html) QNN Context Binary (Inception_v3_quantized.serialized.bin) was generated and was run on my Galaxy S24 mobile. /data/local/tmp/inception_v3 # ./qnn-net-run --backend libQnnHtp.so --input_list target_raw_list.txt --retrieve_context Inception_v3_quantized.serialized.bin qnn-net-run pid:10047
Do you know how to run QNN Context Binary for Llama model on my Galaxy S24 mobile ? There are 8 files of QNN Context Binary for Llama model which are generated in Qualcomm AI Hub. one of them is generated in Qualcomm AI Hub. [Job ID : jwgo43dd5 and m9m589p4m] [2024-07-04 06:19:27,319] [INFO] Running /qnn_sdk/bin/x86_64-linux-clang/qnn-context-binary-generator --backend /qnn_sdk/lib/x86_64-linux-clang/libQnnHtp.so --model /tmp/777fb919-eeff-41ed-b425-d60671b9e0b6cyqahkvz/tmpheomzjxn.so --output_dir /tmp/777fb919-eeff-41ed-b425-d60671b9e0b6cyqahkvz/tmpok8jolqd --binary_file qnn_model --config_file /tmp/777fb919-eeff-41ed-b425-d60671b9e0b6cyqahkvz/tmpok8jolqd/htp_context.json [2024-07-04 06:25:45,508] [INFO] qnn-context-binary-generator pid:13485 [2024-07-04 06:25:46,312] [INFO] -=- Extracting input shape information (qnn-context-binary-utility) -=- [2024-07-04 06:25:46,313] [INFO] Running /qnn_sdk/bin/x86_64-linux-clang/qnn-context-binary-utility --context_binary /tmp/777fb919-eeff-41ed-b425-d60671b9e0b6cyqahkvz/tmpd_6hlkmx/model.bin --json_file /tmp/777fb919-eeff-41ed-b425-d60671b9e0b6cyqahkvz/tmppkjtnrn2.json [2024-07-04 06:25:47,530] [INFO] -=- Compilation completed -=-
What should I put in as --input_list for Llama model ?
Best Regards,