quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
338 stars 45 forks source link

[BUG] fail to convert QNN Source to QNN Model Library (qnn-model-lib-generator) for Llama v2_7b_chat_quantized #64

Open taeyeonlee opened 2 days ago

taeyeonlee commented 2 days ago

Describe the bug [Ai Hub Job ID : j2p0v386g] fail to convert QNN Source to QNN Model Library (qnn-model-lib-generator) for Llama v2_7b_chat_quantized

To Reproduce pip install qai-hub qai-hub configure --api_token xxxxxxxx pip install "qai_hub_models[llama_v2_7b_chat_quantized]" python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export Go to AI Hub [https://app.aihub.qualcomm.com/jobs/j2p0v386g/] See the result log

Expected behavior want to succeed converting QNN Source (CPP, BIN) to QNN Model Library (so)

Host configuration:

Log [2024-07-03 12:02:19,199] [INFO] -=- Converting ONNX graph: decompose Gelu layers -=- [2024-07-03 12:02:32,588] [INFO] -=- ONNX to QNN Source (qnn-onnx-converter) -=- [2024-07-03 12:02:32,588] [INFO] Running /tetra/tetra_env_qnn/bin/python3 /qnn_sdk/bin/x86_64-linux-clang/qnn-onnx-converter --input_network /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpknhvr5/tmpycc3oq6t.onnx --output_path /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpknhvr5/tmpycc3oq6t.cpp --preserve_io layout input_ids attention_mask position_ids_cos position_ids_sin layers_7_add_out_0 past_key_0_h0_out past_key_0_h1_out past_key_0_h2_out past_key_0_h3_out past_key_0_h4_out past_key_0_h5_out past_key_0_h6_out past_key_0_h7_out past_key_0_h8_out past_key_0_h9_out past_key_0_h10_out past_key_0_h11_out past_key_0_h12_out past_key_0_h13_out past_key_0_h14_out past_key_0_h15_out past_key_0_h16_out past_key_0_h17_out past_key_0_h18_out past_key_0_h19_out past_key_0_h20_out past_key_0_h21_out past_key_0_h22_out past_key_0_h23_out past_key_0_h24_out past_key_0_h25_out past_key_0_h26_out past_key_0_h27_out past_key_0_h28_out past_key_0_h29_out past_key_0_h30_out past_key_0_h31_out past_value_0_h0_out past_value_0_h1_out past_value_0_h2_out past_value_0_h3_out past_value_0_h4_out past_value_0_h5_out past_value_0_h6_out past_value_0_h7_out past_value_0_h8_out past_value_0_h9_out past_value_0_h10_out past_value_0_h11_out past_value_0_h12_out past_value_0_h13_out past_value_0_h14_out past_value_0_h15_out past_value_0_h16_out past_value_0_h17_out past_value_0_h18_out past_value_0_h19_out past_value_0_h20_out past_value_0_h21_out past_value_0_h22_out past_value_0_h23_out past_value_0_h24_out past_value_0_h25_out past_value_0_h26_out past_value_0_h27_out past_value_0_h28_out past_value_0_h29_out past_value_0_h30_out past_value_0_h31_out past_key_1_h0_out past_key_1_h1_out past_key_1_h2_out past_key_1_h3_out past_key_1_h4_out past_key_1_h5_out past_key_1_h6_out past_key_1_h7_out past_key_1_h8_out past_key_1_h9_out past_key_1_h10_out past_key_1_h11_out past_key_1_h12_out past_key_1_h13_out past_key_1_h14_out past_key_1_h15_out past_key_1_h16_out past_key_1_h17_out past_key_1_h18_out past_key_1_h19_out past_key_1_h20_out past_key_1_h21_out past_key_1_h22_out past_key_1_h23_out past_key_1_h24_out past_key_1_h25_out past_key_1_h26_out past_key_1_h27_out past_key_1_h28_out past_key_1_h29_out past_key_1_h30_out past_key_1_h31_out past_value_1_h0_out past_value_1_h1_out past_value_1_h2_out past_value_1_h3_out past_value_1_h4_out past_value_1_h5_out past_value_1_h6_out past_value_1_h7_out past_value_1_h8_out past_value_1_h9_out past_value_1_h10_out past_value_1_h11_out past_value_1_h12_out past_value_1_h13_out past_value_1_h14_out past_value_1_h15_out past_value_1_h16_out past_value_1_h17_out past_value_1_h18_out past_value_1_h19_out past_value_1_h20_out past_value_1_h21_out past_value_1_h22_out past_value_1_h23_out past_value_1_h24_out past_value_1_h25_out past_value_1_h26_out past_value_1_h27_out past_value_1_h28_out past_value_1_h29_out past_value_1_h30_out past_value_1_h31_out past_key_2_h0_out past_key_2_h1_out past_key_2_h2_out past_key_2_h3_out past_key_2_h4_out past_key_2_h5_out past_key_2_h6_out past_key_2_h7_out past_key_2_h8_out past_key_2_h9_out past_key_2_h10_out past_key_2_h11_out past_key_2_h12_out past_key_2_h13_out past_key_2_h14_out past_key_2_h15_out past_key_2_h16_out past_key_2_h17_out past_key_2_h18_out past_key_2_h19_out past_key_2_h20_out past_key_2_h21_out past_key_2_h22_out past_key_2_h23_out past_key_2_h24_out past_key_2_h25_out past_key_2_h26_out past_key_2_h27_out past_key_2_h28_out past_key_2_h29_out past_key_2_h30_out past_key_2_h31_out past_value_2_h0_out past_value_2_h1_out past_value_2_h2_out past_value_2_h3_out past_value_2_h4_out past_value_2_h5_out past_value_2_h6_out past_value_2_h7_out past_value_2_h8_out past_value_2_h9_out past_value_2_h10_out past_value_2_h11_out past_value_2_h12_out past_value_2_h13_out past_value_2_h14_out past_value_2_h15_out past_value_2_h16_out past_value_2_h17_out past_value_2_h18_out past_value_2_h19_out past_value_2_h20_out past_value_2_h21_out past_value_2_h22_out past_value_2_h23_out past_value_2_h24_out past_value_2_h25_out past_value_2_h26_out past_value_2_h27_out past_value_2_h28_out past_value_2_h29_out past_value_2_h30_out past_value_2_h31_out past_key_3_h0_out past_key_3_h1_out past_key_3_h2_out past_key_3_h3_out past_key_3_h4_out past_key_3_h5_out past_key_3_h6_out past_key_3_h7_out past_key_3_h8_out past_key_3_h9_out past_key_3_h10_out past_key_3_h11_out past_key_3_h12_out past_key_3_h13_out past_key_3_h14_out past_key_3_h15_out past_key_3_h16_out past_key_3_h17_out past_key_3_h18_out past_key_3_h19_out past_key_3_h20_out past_key_3_h21_out past_key_3_h22_out past_key_3_h23_out past_key_3_h24_out past_key_3_h25_out past_key_3_h26_out past_key_3_h27_out past_key_3_h28_out past_key_3_h29_out past_key_3_h30_out past_key_3_h31_out past_value_3_h0_out past_value_3_h1_out past_value_3_h2_out past_value_3_h3_out past_value_3_h4_out past_value_3_h5_out past_value_3_h6_out past_value_3_h7_out past_value_3_h8_out past_value_3_h9_out past_value_3_h10_out past_value_3_h11_out past_value_3_h12_out past_value_3_h13_out past_value_3_h14_out past_value_3_h15_out past_value_3_h16_out past_value_3_h17_out past_value_3_h18_out past_value_3_h19_out past_value_3_h20_out past_value_3_h21_out past_value_3_h22_out past_value_3_h23_out past_value_3_h24_out past_value_3_h25_out past_value_3_h26_out past_value_3_h27_out past_value_3_h28_out past_value_3_h29_out past_value_3_h30_out past_value_3_h31_out past_key_4_h0_out past_key_4_h1_out past_key_4_h2_out past_key_4_h3_out past_key_4_h4_out past_key_4_h5_out past_key_4_h6_out past_key_4_h7_out past_key_4_h8_out past_key_4_h9_out past_key_4_h10_out past_key_4_h11_out past_key_4_h12_out past_key_4_h13_out past_key_4_h14_out past_key_4_h15_out past_key_4_h16_out past_key_4_h17_out past_key_4_h18_out past_key_4_h19_out past_key_4_h20_out past_key_4_h21_out past_key_4_h22_out past_key_4_h23_out past_key_4_h24_out past_key_4_h25_out past_key_4_h26_out past_key_4_h27_out past_key_4_h28_out past_key_4_h29_out past_key_4_h30_out past_key_4_h31_out past_value_4_h0_out past_value_4_h1_out past_value_4_h2_out past_value_4_h3_out past_value_4_h4_out past_value_4_h5_out past_value_4_h6_out past_value_4_h7_out past_value_4_h8_out past_value_4_h9_out past_value_4_h10_out past_value_4_h11_out past_value_4_h12_out past_value_4_h13_out past_value_4_h14_out past_value_4_h15_out past_value_4_h16_out past_value_4_h17_out past_value_4_h18_out past_value_4_h19_out past_value_4_h20_out past_value_4_h21_out past_value_4_h22_out past_value_4_h23_out past_value_4_h24_out past_value_4_h25_out past_value_4_h26_out past_value_4_h27_out past_value_4_h28_out past_value_4_h29_out past_value_4_h30_out past_value_4_h31_out past_key_5_h0_out past_key_5_h1_out past_key_5_h2_out past_key_5_h3_out past_key_5_h4_out past_key_5_h5_out past_key_5_h6_out past_key_5_h7_out past_key_5_h8_out past_key_5_h9_out past_key_5_h10_out past_key_5_h11_out past_key_5_h12_out past_key_5_h13_out past_key_5_h14_out past_key_5_h15_out past_key_5_h16_out past_key_5_h17_out past_key_5_h18_out past_key_5_h19_out past_key_5_h20_out past_key_5_h21_out past_key_5_h22_out past_key_5_h23_out past_key_5_h24_out past_key_5_h25_out past_key_5_h26_out past_key_5_h27_out past_key_5_h28_out past_key_5_h29_out past_key_5_h30_out past_key_5_h31_out past_value_5_h0_out past_value_5_h1_out past_value_5_h2_out past_value_5_h3_out past_value_5_h4_out past_value_5_h5_out past_value_5_h6_out past_value_5_h7_out past_value_5_h8_out past_value_5_h9_out past_value_5_h10_out past_value_5_h11_out past_value_5_h12_out past_value_5_h13_out past_value_5_h14_out past_value_5_h15_out past_value_5_h16_out past_value_5_h17_out past_value_5_h18_out past_value_5_h19_out past_value_5_h20_out past_value_5_h21_out past_value_5_h22_out past_value_5_h23_out past_value_5_h24_out past_value_5_h25_out past_value_5_h26_out past_value_5_h27_out past_value_5_h28_out past_value_5_h29_out past_value_5_h30_out past_value_5_h31_out past_key_6_h0_out past_key_6_h1_out past_key_6_h2_out past_key_6_h3_out past_key_6_h4_out past_key_6_h5_out past_key_6_h6_out past_key_6_h7_out past_key_6_h8_out past_key_6_h9_out past_key_6_h10_out past_key_6_h11_out past_key_6_h12_out past_key_6_h13_out past_key_6_h14_out past_key_6_h15_out past_key_6_h16_out past_key_6_h17_out past_key_6_h18_out past_key_6_h19_out past_key_6_h20_out past_key_6_h21_out past_key_6_h22_out past_key_6_h23_out past_key_6_h24_out past_key_6_h25_out past_key_6_h26_out past_key_6_h27_out past_key_6_h28_out past_key_6_h29_out past_key_6_h30_out past_key_6_h31_out past_value_6_h0_out past_value_6_h1_out past_value_6_h2_out past_value_6_h3_out past_value_6_h4_out past_value_6_h5_out past_value_6_h6_out past_value_6_h7_out past_value_6_h8_out past_value_6_h9_out past_value_6_h10_out past_value_6_h11_out past_value_6_h12_out past_value_6_h13_out past_value_6_h14_out past_value_6_h15_out past_value_6_h16_out past_value_6_h17_out past_value_6_h18_out past_value_6_h19_out past_value_6_h20_out past_value_6_h21_out past_value_6_h22_out past_value_6_h23_out past_value_6_h24_out past_value_6_h25_out past_value_6_h26_out past_value_6_h27_out past_value_6_h28_out past_value_6_h29_out past_value_6_h30_out past_value_6_h31_out past_key_7_h0_out past_key_7_h1_out past_key_7_h2_out past_key_7_h3_out past_key_7_h4_out past_key_7_h5_out past_key_7_h6_out past_key_7_h7_out past_key_7_h8_out past_key_7_h9_out past_key_7_h10_out past_key_7_h11_out past_key_7_h12_out past_key_7_h13_out past_key_7_h14_out past_key_7_h15_out past_key_7_h16_out past_key_7_h17_out past_key_7_h18_out past_key_7_h19_out past_key_7_h20_out past_key_7_h21_out past_key_7_h22_out past_key_7_h23_out past_key_7_h24_out past_key_7_h25_out past_key_7_h26_out past_key_7_h27_out past_key_7_h28_out past_key_7_h29_out past_key_7_h30_out past_key_7_h31_out past_value_7_h0_out past_value_7_h1_out past_value_7_h2_out past_value_7_h3_out past_value_7_h4_out past_value_7_h5_out past_value_7_h6_out past_value_7_h7_out past_value_7_h8_out past_value_7_h9_out past_value_7_h10_out past_value_7_h11_out past_value_7_h12_out past_value_7_h13_out past_value_7_h14_out past_value_7_h15_out past_value_7_h16_out past_value_7_h17_out past_value_7_h18_out past_value_7_h19_out past_value_7_h20_out past_value_7_h21_out past_value_7_h22_out past_value_7_h23_out past_value_7_h24_out past_value_7_h25_out past_value_7_h26_out past_value_7_h27_out past_value_7_h28_out past_value_7_h29_out past_value_7_h30_out past_value_7_h31_out --input_layout attention_mask NONTRIVIAL --input_layout position_ids_cos NONTRIVIAL --input_layout position_ids_sin NONTRIVIAL --input_list /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpknhvr5/input_list.txt --input_dim input_ids 1,1024 --input_dim attention_mask 1,1,1024,1024 --input_dim position_ids_cos 1,1,1024,64 --input_dim position_ids_sin 1,1,1024,64 --input_dtype input_ids int32 --input_dtype attention_mask float32 --input_dtype position_ids_cos float32 --input_dtype position_ids_sin float32 --bias_bw 32 --weight_bw 8 --act_bw 16 --no_simplification --quantization_overrides /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpb1d7bjoy/Llama2_PromptProcessor_1_Quantized_aimet_zip_mknj5zw1m.aimet/Llama2_PromptProcessor_1_Quantized.encodings [2024-07-03 12:09:52,826] [INFO] 2024-07-03 12:02:33,083 - 235 - INFO - Processing user provided quantization encodings: 2024-07-03 12:02:41,990 - 240 - WARNING - Symbolic shape inference Failed. Exception: Onnxruntime package not found in current environment. Symbolic Shape Inference will be skipped.. Running normal shape inference. 2024-07-03 12:05:57,292 - 235 - INFO - Processed 8671 quantization encodings IrQuantizer: Param Quantizer should be set to symmetric for 32 bit biases. Will ignore param quantizer option: tf for biases 2024-07-03 12:09:47,182 - 235 - INFO - Saving QNN Model... 2024-07-03 12:09:50,372 - 235 - INFO - Model CPP saved at: /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpknhvr5/tmpycc3oq6t.cpp 2024-07-03 12:09:50,372 - 235 - INFO - Model BIN saved at: /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpknh__vr5/tmpycc3oq6t.bin 2024-07-03 12:09:51,585 - 235 - INFO - Conversion complete! 3.6ms [ INFO ] Inferences will run in sync mode 3.7ms [ INFO ] Initializing logging in the backend. Callback: [0x7f8178919e80], Log Level: [3] 3.7ms [ INFO ] No BackendExtensions lib provided;initializing NetRunBackend Interface 0.2ms [ INFO ] [QNN_CPU] CpuBackend creation start 0.2ms [ INFO ] [QNN_CPU] CpuBackend creation end 3.9ms [WARNING] Unable to find a device with NetRunDeviceKeyDefault in Library NetRunBackendLibKeyDefault 0.2ms [ INFO ] [QNN_CPU] QnnContext create start 0.2ms [ INFO ] [QNN_CPU] QnnContext create end 4.2ms [ INFO ] Entering QuantizeRuntimeApp flow 0.5ms [ INFO ] [QNN_CPU] CpuGraph creation start 0.5ms [ INFO ] [QNN_CPU] CpuGraph creation end 0.5ms [ INFO ] [QNN_CPU] QnnGraph create end 3188.0ms [ INFO ] [QNN_CPU] QnnGraph finalize start 7696.0ms [ INFO ] [QNN_CPU] QnnGraph finalize end 7711.8ms [ INFO ] [QNN_CPU] QnnGraph execute start 20120.7ms [ INFO ] [QNN_CPU] QnnGraph execute end 20233.8ms [ INFO ] cleaning up resources for input tensors 20234.1ms [ INFO ] cleaning up resources for output tensors 88001.4ms [ INFO ] Freeing graphsInfo 88001.9ms [ INFO ] [QNN_CPU] QnnContext Free start 88865.7ms [ INFO ] [QNN_CPU] QnnContext Free end 88865.9ms [ INFO ] [QNN_CPU] QnnBackend Free start 88865.9ms [ INFO ] [QNN_CPU] QnnBackend Free end

[2024-07-03 12:10:08,167] [INFO] -=- QNN Source to QNN Model Library (qnn-model-lib-generator) -=- [2024-07-03 12:10:08,167] [INFO] Running /qnn_sdk/bin/x86_64-linux-clang/qnn-model-lib-generator -c /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmp7svebk1t.cpp -b /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmp_5x61wdw.bin -t x86_64-linux-clang -o /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpo72vmpqh -l qnn_model [2024-07-03 12:12:00,235] [INFO] 2024-07-03 12:10:08,217 - INFO - qnn-model-lib-generator: Model cpp file path : /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmp7svebk1t.cpp 2024-07-03 12:10:08,217 - INFO - qnn-model-lib-generator: Model bin file path : /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmp_5x61wdw.bin 2024-07-03 12:10:08,217 - INFO - qnn-model-lib-generator: Library target : [['x86_64-linux-clang']] 2024-07-03 12:10:08,217 - INFO - qnn-model-lib-generator: Library name : qnn_model 2024-07-03 12:10:08,218 - INFO - qnn-model-lib-generator: Output directory : /tmp/868f0d23-9714-4ada-add4-922248dce525abckycxm/tmpo72vmpqh 2024-07-03 12:10:08,218 - INFO - qnn-model-lib-generator: Output library name : qnn_model 2024-07-03 12:11:58,933 - ERROR - qnn-model-lib-generator: command : ['cd /tetra/tetracode/tmp_5749 && export QNN_MODEL_LIB_NAME=libqnn_model.so && make CXX="clang++-14" -f Makefile.linux-x86_64'] 2024-07-03 12:11:58,941 - ERROR - qnn-model-lib-generator: rc : 0 2024-07-03 12:11:58,941 - ERROR - qnn-model-lib-generator: stdout : mkdir -p obj/x86_64-linux-clang clang++-14 -c -std=c++11 -march=x86-64 -O3 -fno-exceptions -fno-rtti -Wno-write-strings -DQNN_API="attribute((visibility(\"default\")))" -fPIC -fvisibility=hidden -Ijni/ -I/qnn_sdk/include/QNN jni/QnnModel.cpp -o obj/x86_64-linux-clang/QnnModel.o clang++-14 -c -std=c++11 -march=x86-64 -O3 -fno-exceptions -fno-rtti -Wno-write-strings -DQNN_API="attribute((visibility(\"default\")))" -fPIC -fvisibility=hidden -Ijni/ -I/qnn_sdk/include/QNN jni/QnnModelPal.cpp -o obj/x86_64-linux-clang/QnnModelPal.o clang++-14 -c -std=c++11 -march=x86-64 -O3 -fno-exceptions -fno-rtti -Wno-write-strings -DQNN_API="attribute((visibility(\"default\")))" -fPIC -fvisibility=hidden -Ijni/ -I/qnn_sdk/include/QNN jni/QnnWrapperUtils.cpp -o obj/x86_64-linux-clang/QnnWrapperUtils.o clang++-14 -c -std=c++11 -march=x86-64 -O3 -fno-exceptions -fno-rtti -Wno-write-strings -DQNN_API="attribute((visibility(\"default\")))" -fPIC -fvisibility=hidden -Ijni/ -I/qnn_sdk/include/QNN jni/tmp7svebk1t.cpp -o obj/x86_64-linux-clang/tmp7svebk1t.o mkdir -p obj/binary/x86_64-linux-clang touch obj/binary/extractbinary touch obj/binary/x86_64-linux-clang/objcopyDone mkdir -p libs/x86_64-linux-clang

2024-07-03 12:11:58,953 - ERROR - qnn-model-lib-generator: stderr : jni/tmp7svebk1t.cpp:37:51: warning: mixture of designated and non-designated initializers in the same initializer list is a C99 extension [-Wc99-designator] .hybridCoo= {.numSpecifiedElements= 0, .numSparseDimensions= 0}}, ^~~~~~~~~~~~~~~ jni/QnnWrapperUtils.hpp:77:17: note: expanded from macro 'VALIDATE' retStatus = value; \ ^~~~~ jni/tmp7svebk1t.cpp:36:51: note: first non-designated initializer is here .sparseParams= { QNN_SPARSE_LAYOUT_UNDEFINED, ^~~~~~~ .....

mestrona-3 commented 2 days ago

Hi @taeyeonlee, these are expected warnings in the compile log but since the compile job succeeded they are nothing to worry about. We don't support model library as an output Llamav2 (its an intermediate step towards the context binary which is why you see these errors), which is why you haven't received a .so file.

taeyeonlee commented 2 days ago

Hi, @mestrona-3, thanks for the info. Could you please share how to run QNN Source (CPP, BIN) on my Android Device (Snapdragon 8 Gen 3 Mobile) ?