pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.94k stars 323 forks source link

Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

Open tiger-of-shawn opened 3 days ago

tiger-of-shawn commented 3 days ago

some logs:

source shell_scripts/export_llama.sh qwen2 "" "" "" llama3.txt

checkpoint_files: ['models/llm_models/weights/Qwen2.5-0.5B-Instruct/model.safetensors'] Preparing Model Calibration Inputs... Exporting Chunk 0 to PTE Getting pre autograd ATen Dialect Graph model info: Qwen2ModelChunk( (layers): ModuleList( (0-23): 24 x Qwen2DecoderLayer( (self_attn): Qwen2Attention( (q_proj): Linear(in_features=896, out_features=896, bias=True) (k_proj): Linear(in_features=896, out_features=128, bias=True) (v_proj): Linear(in_features=896, out_features=128, bias=True) (o_proj): Linear(in_features=896, out_features=896, bias=False) ) (mlp): Qwen2MLP( (gate_proj): Linear(in_features=896, out_features=4864, bias=False) (down_proj): Linear(in_features=4864, out_features=896, bias=False) (up_proj): Linear(in_features=896, out_features=4864, bias=False) ) (input_norm): RMSNorm() (post_attention_norm): RMSNorm() ) ) (norm): RMSNorm() (lm_head): Linear(in_features=896, out_features=151936, bias=False) )

W1015 10:29:36.177991 578378 torch/_export/init.py:64] +============================+ W1015 10:29:36.178128 578378 torch/_export/init.py:65] | !!! WARNING !!! | W1015 10:29:36.178169 578378 torch/_export/init.py:66] +============================+ W1015 10:29:36.178198 578378 torch/_export/init.py:67] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward. W1015 10:29:36.178226 578378 torch/_export/init.py:68] Please switch to use torch.export.export_for_training instead. Batch: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:05<00:00, 1.86it/s] Calibrating Model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.90s/it] Getting ATen Dialect Graph Exporting Shape 128t512c to: pte/Qwen2.5-0.5B-Instruct_A16W4_1_chunks_128t512c/Qwen2.5-0.5B-Instruct_A16W4_1_chunks_128t512c_0.pte example_input shape: torch.Size([1, 128, 896]) Lowering to Edge Dialect Graph Delegating Edge Program to Neuropilot Backend

Traceback (most recent call last): File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 491, in main() File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 477, in main export_to_et_ir( File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 362, in export_to_et_ir delegated_program = edge_program.to_backend(partitioner) File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1288, in to_backend new_edge_programs[name] = to_backend(program, partitioner) File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/functools.py", line 878, in wrapper return dispatch(args[0].class)(*args, *kw) File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backendapi.py", line 387, in tagged_graph_module = _partition_and_lower( File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 310, in _partition_and_lower partitioned_module = _partition_and_lower_one_graph_module( File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 249, in _partition_and_lower_one_graph_module lowered_submodule = to_backend( File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/functools.py", line 878, in wrapper return dispatch(args[0].class)(args, **kw) File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backendapi.py", line 113, in preprocess_result: PreprocessResult = cls.preprocess( File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/backends/mediatek/preprocess.py", line 68, in preprocess model_bytes = mtk_neuron.compile(mlir_str, " ".join(compile_options)) File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/mtk_neuron/mtk_neuron.py", line 127, in

compile raise RuntimeError(f'Compile error:\n{status["log"]}') RuntimeError: Compile error: NIR[1761]: FullyConnectedLayer ├ MDLA: Dimension should be <= 65535. Operand: 1 got <151936 x 896>. ├ MDLA: Dimension should be <= 65535. Result : 0 got <128 x 151936>. ├ EDPA: unsupported operation WARNING: Failed to process the supernode.

tiger-of-shawn commented 2 days ago

The error seems to be related to 'tie_word_embeddings.' I will try to work on a fix soon.

neuropilot-captain commented 1 day ago

Hi, @tiger-of-shawn, thanks for your feedback! We have released the latest NeuroPilot Express SDK for ExecuTorch. This update includes optimizations specifically addressing the issue you highlighted. Please give it a try!

tiger-of-shawn commented 23 hours ago

Hi, @tiger-of-shawn, thanks for your feedback! We have released the latest NeuroPilot Express SDK for ExecuTorch. This update includes optimizations specifically addressing the issue you highlighted. Please give it a try!

Thank you for your response; it’s working perfectly now.

I have run the sample application on MTK 9000, prefill 990 tokens/s, decode 61 tokens/s

I 00:00:01.045045 executorch:mtk_llama_executor_runner.cpp:194] Done analyzing prompt in 0.129182 sec (990.850118 tok/s) I 00:00:04.956007 executorch:mtk_llama_executor_runner.cpp:296] Token generation speed: 61.639103 tok/s