[Genie] fails to generate genie-compatible QNN binaries

Dear Qualcomm,

It fails to generate genie-compatible QNN binaries, according to the guide (https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama) The error log is following. FileNotFoundError: Unable to find the model source file, invalid path: /mnt/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama/export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.cpp

    There are the ONNX file : /export/intermediate_data/input_models/model_pp_0/Llama2_PromptProcessor_1_Quantized.aimet/Llama2_PromptProcessor_1_Quantized.onnx
    but, no CPP file : /export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.cpp

Error log


(qct_python310_VENV_root) taeyeon@taeyeon-PC:/hdd/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama$ python gen_ondevice_llama.py --hub-model-id mjqyxjlvm,mdno2o9gq,m7n1x51rq,m7qk5ozxq,mpn7y95jn,mknj6orxq,mrmdwz90q,mrmdwo8oq --output-dir ./export --tokenizer-zip-path ./tokenizer.zip --target-gen snapdragon-gen3 --target-os android
Using previously extracted model
Generating model lib for split-0 pp
./export/intermediate_data/input_data/data_pp_0.h5 export/intermediate_data/input_data/data_pp_0/_data
/opt/qcom/aistack/qairt/2.25.0.240728/bin/x86_64-linux-clang/qnn-onnx-converter --input_network ./export/intermediate_data/input_models/model_pp_0/Llama2_PromptProcessor_1_Quantized.aimet/Llama2_PromptProcessor_1_Quantized.onnx --output_path ./export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.cpp --no_simplification --quantization_overrides ./export/intermediate_data/input_models/model_pp_0/Llama2_PromptProcessor_1_Quantized.aimet/Llama2_PromptProcessor_1_Quantized.encodings --preserve_io layout input_ids attention_mask position_ids_cos position_ids_sin layers_7_add_out_0 past_key_0_out past_value_0_out past_key_1_out past_value_1_out past_key_2_out past_value_2_out past_key_3_out past_value_3_out past_key_4_out past_value_4_out past_key_5_out past_value_5_out past_key_6_out past_value_6_out past_key_7_out past_value_7_out --input_layout attention_mask NONTRIVIAL --input_layout position_ids_cos NONTRIVIAL --input_layout position_ids_sin NONTRIVIAL --input_list export/intermediate_data/input_data/data_pp_0/_data/input_list.txt --input_dim input_ids 1,1024 --input_dim attention_mask 1,1,1024,1024 --input_dim position_ids_cos 1,1,1024,64 --input_dim position_ids_sin 1,1,1024,64 --input_dtype input_ids int32 --input_dtype attention_mask float32 --input_dtype position_ids_cos float32 --input_dtype position_ids_sin float32 --bias_bitwidth 32 --weight_bw 8 --act_bitwidth 16
2024-09-12 11:44:39,286 - 235 - INFO - Processing user provided quantization encodings:
2024-09-12 11:48:16,882 - 240 - WARNING - WARNING_CAST_TYPE: Only numerical type cast is supported. The op: /model/model/layers.0/input_layernorm/Cast will be interpreted at conversion time
2024-09-12 11:48:16,892 - 240 - WARNING - WARNING_CAST_TYPE: Only numerical type cast is supported. The op: /model/model/layers.0/input_layernorm/Cast_1 will be interpreted at conversion time
2024-09-12 11:48:17,879 - 240 - WARNING - WARNING_CAST_TYPE: Only numerical type cast is supported. The op: /model/model/layers.0/self_attn/Cast will be interpreted at conversion time
,,,
2024-09-12 11:48:27,826 - 240 - WARNING - WARNING_CAST_TYPE: Only numerical type cast is supported. The op: /model/model/layers.7/post_attention_layernorm/Cast_1 will be interpreted at conversion time
2024-09-12 11:48:37,409 - 235 - INFO - Processed 7623 quantization encodings
2024-09-12 11:50:31,235 - 240 - WARNING - --weight_bw option is deprecated, use --weights_bitwidth.
IrQuantizer: Param Quantizer should be set to symmetric for 32 bit biases. Will ignore param quantizer option: tf for biases
    28.2ms [  INFO ] Inferences will run in sync mode
    28.9ms [  INFO ] Initializing logging in the backend. Callback: [0x710b47a96770], Log Level: [3]
    29.0ms [  INFO ] No BackendExtensions lib provided;initializing NetRunBackend Interface
     2.4ms [  INFO ] [QNN_CPU] CpuBackend creation start
     2.4ms [  INFO ] [QNN_CPU] CpuBackend creation end
    31.4ms [WARNING] Unable to find a device with NetRunDeviceKeyDefault in Library NetRunBackendLibKeyDefault
    31.4ms [WARNING] Profile Logger with name = defaultKey doesn't exist! Returning nullptr
     3.8ms [  INFO ] [QNN_CPU] QnnContext create start
     3.8ms [  INFO ] [QNN_CPU] QnnContext create end
    33.2ms [  INFO ] Entering QuantizeRuntimeApp flow
    33.2ms [WARNING] Profile Logger with name = defaultKey doesn't exist! Returning nullptr
     4.3ms [  INFO ] [QNN_CPU] CpuGraph creation start
     4.7ms [  INFO ] [QNN_CPU] CpuGraph creation end
     4.7ms [  INFO ] [QNN_CPU] QnnGraph create end
  5407.6ms [  INFO ] [QNN_CPU] QnnGraph finalize start
Generating model lib...
/opt/qcom/aistack/qairt/2.25.0.240728/bin/x86_64-linux-clang/qnn-model-lib-generator -c ./export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.cpp -b ./export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.bin -t x86_64-linux-clang -o ./export/intermediate_data/model_libs -l Llama2_PromptProcessor_1_Quantized
2024-09-12 11:52:16,354 -    INFO - qnn-model-lib-generator: Model cpp file path  : export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.cpp
2024-09-12 11:52:16,354 -    INFO - qnn-model-lib-generator: Model bin file path  : export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.bin
2024-09-12 11:52:16,354 -    INFO - qnn-model-lib-generator: Library target       : [['x86_64-linux-clang']]
2024-09-12 11:52:16,354 -    INFO - qnn-model-lib-generator: Library name         : Llama2_PromptProcessor_1_Quantized
2024-09-12 11:52:16,354 -    INFO - qnn-model-lib-generator: Output directory     : export/intermediate_data/model_libs
Traceback (most recent call last):
  File "/opt/qcom/aistack/qairt/2.25.0.240728/bin/x86_64-linux-clang/qnn-model-lib-generator", line 495, in <module>
    main()
  File "/opt/qcom/aistack/qairt/2.25.0.240728/bin/x86_64-linux-clang/qnn-model-lib-generator", line 489, in main
    result = generator.build_targets(config)
  File "/opt/qcom/aistack/qairt/2.25.0.240728/bin/x86_64-linux-clang/qnn-model-lib-generator", line 393, in build_targets
    self._normalize_config(config)
  File "/opt/qcom/aistack/qairt/2.25.0.240728/bin/x86_64-linux-clang/qnn-model-lib-generator", line 361, in _normalize_config
    raise FileNotFoundError(f'Unable to find the model source file, invalid path: {config.model_cpp.absolute()}')
FileNotFoundError: Unable to find the model source file, invalid path: /mnt/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama/export/intermediate_data/cpp_models/Llama2_PromptProcessor_1_Quantized.cpp
Traceback (most recent call last):
  File "/mnt/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama/gen_ondevice_llama.py", line 67, in <module>
    main()
  File "/mnt/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama/gen_ondevice_llama.py", line 61, in main
    generate_shared_bins(
  File "/mnt/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama/utils.py", line 546, in generate_shared_bins
    generate_lib(
  File "/mnt/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama/utils.py", line 276, in generate_lib
    raise RuntimeError("The QNN graph compiler did not produce the output file")
RuntimeError: The QNN graph compiler did not produce the output file
(qct_python310_VENV_root) taeyeon@taeyeon-PC:~/hdd/hdd/QCT_AI_Hub/ai-hub-models/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama$

Best regards,

quic / ai-hub-models

[Genie] fails to generate genie-compatible QNN binaries #98