Open zqxuturbo opened 1 week ago
thanks @zqxuturbo for reporting this. We are aware of this issue and root cause is bundling i/o for past key value. This happens due to large number of key value heads and large input sequence lenght
We are fixing this in next release
Hi, When I convert llama2-7b with target device 8gen2、8295、8775 (htp v73, soc 43),memory error occured:
2024-11-13 14:49:27,866 - INFO - qnn-model-lib-generator: Target: x86_64-linux-clang Library: /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmp_7_tx2cu/x86_64-linux-clang/libqnn_model.so
[2024-11-13 14:49:58,453] [INFO] Saving model [2024-11-13 14:50:10,731] [INFO] Graph name: prompt_part1 [2024-11-13 14:50:10,733] [INFO] -=- QNN Model Libraries to QNN Context Binary (qnn-context-binary-generator) -=- [2024-11-13 14:50:10,734] [INFO] Contents of HTP Settings: {'graphs': [{'graph_names': ['prompt_part1'], 'fp16_relaxed_precision': 1, 'vtcm_mb': 0, 'O': 3}], 'devices': [{'dsp_arch': 'v73', 'soc_model': 43}]} [2024-11-13 14:50:10,734] [INFO] Contents of HTP Config file used: {'backend_extensions': {'shared_library_path': 'libQnnHtpNetRunExtensions.so', 'config_file_path': '/tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpyvclsp5b/htp_setting.json'}, 'context_configs': {'enable_graphs': ['prompt_part1']}, 'graph_configs': [{'graph_name': 'prompt_part1'}], 'memory': {'mem_type': 'shared_buffer'}} [2024-11-13 14:50:10,734] [INFO] Running /qnn_sdk/bin/x86_64-linux-clang/qnn-context-binary-generator --backend /qnn_sdk/lib/x86_64-linux-clang/libQnnHtp.so --model /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpmuz4pbse.so --output_dir /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpyvclsp5b --binary_file qnn_model --config_file /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpyvclsp5b/htp_context.json [2024-11-13 14:56:26,382] [INFO] qnn-context-binary-generator pid:15974 0.0ms [ ERROR ] fa_alloc.cc:3866:ERROR:graph requires estimated allocation of 2473687 KB, limit is 2097152 KB
374134.5ms [ ERROR ] Could not get binary. Graph Finalize failure
[2024-11-13 14:56:26,560] [ERROR] Conversion to context binary failed with exit code 15