quic / ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
BSD 3-Clause "New" or "Revised" License
66 stars 15 forks source link

graph_prepare.cc:742:ERROR:error during serialize: memory usage too large #16

Open zqxuturbo opened 1 week ago

zqxuturbo commented 1 week ago

Hi, When I convert llama2-7b with target device 8gen2、8295、8775 (htp v73, soc 43),memory error occured:

2024-11-13 14:49:27,866 - INFO - qnn-model-lib-generator: Target: x86_64-linux-clang Library: /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmp_7_tx2cu/x86_64-linux-clang/libqnn_model.so

[2024-11-13 14:49:58,453] [INFO] Saving model [2024-11-13 14:50:10,731] [INFO] Graph name: prompt_part1 [2024-11-13 14:50:10,733] [INFO] -=- QNN Model Libraries to QNN Context Binary (qnn-context-binary-generator) -=- [2024-11-13 14:50:10,734] [INFO] Contents of HTP Settings: {'graphs': [{'graph_names': ['prompt_part1'], 'fp16_relaxed_precision': 1, 'vtcm_mb': 0, 'O': 3}], 'devices': [{'dsp_arch': 'v73', 'soc_model': 43}]} [2024-11-13 14:50:10,734] [INFO] Contents of HTP Config file used: {'backend_extensions': {'shared_library_path': 'libQnnHtpNetRunExtensions.so', 'config_file_path': '/tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpyvclsp5b/htp_setting.json'}, 'context_configs': {'enable_graphs': ['prompt_part1']}, 'graph_configs': [{'graph_name': 'prompt_part1'}], 'memory': {'mem_type': 'shared_buffer'}} [2024-11-13 14:50:10,734] [INFO] Running /qnn_sdk/bin/x86_64-linux-clang/qnn-context-binary-generator --backend /qnn_sdk/lib/x86_64-linux-clang/libQnnHtp.so --model /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpmuz4pbse.so --output_dir /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpyvclsp5b --binary_file qnn_model --config_file /tmp/db824256-3d3a-4b91-a94a-4a841d7d04e5fygbm454/tmpyvclsp5b/htp_context.json [2024-11-13 14:56:26,382] [INFO] qnn-context-binary-generator pid:15974 0.0ms [ ERROR ] fa_alloc.cc:3866:ERROR:graph requires estimated allocation of 2473687 KB, limit is 2097152 KB

 0.0ms [ ERROR ] graph_prepare.cc:742:ERROR:error during serialize: memory usage too large

 0.0ms [ ERROR ] graph_prepare.cc:6095:ERROR:Serialize error: memory usage too large

 0.0ms [ ERROR ] QnnDsp <E> Graph prompt_part1 serialization failed

 0.0ms [ ERROR ] QnnDsp <E> Failed to serialize graph prompt_part1

 0.0ms [ ERROR ] QnnDsp <E> Context binary serialization failed

 0.0ms [ ERROR ] QnnDsp <E> Get context blob failed.

 0.0ms [ ERROR ] QnnDsp <E> Failed to get serialized binary

 0.0ms [ ERROR ] QnnDsp <E> Failed to get context binary with err 0x138f

374134.5ms [ ERROR ] Could not get binary. Graph Finalize failure

[2024-11-13 14:56:26,560] [ERROR] Conversion to context binary failed with exit code 15

bhushan23 commented 5 days ago

thanks @zqxuturbo for reporting this. We are aware of this issue and root cause is bundling i/o for past key value. This happens due to large number of key value heads and large input sequence lenght

We are fixing this in next release