pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.9k stars 310 forks source link

QNN backend llama support #3048

Closed leigao97 closed 5 months ago

leigao97 commented 5 months ago

I am running the dummy_llama.py example provided for the QNN backend. Here is the command I am using:

python -m examples.qualcomm.scripts.dummy_llama2 -F -c -m SM8650 -b ./build_android

where -F specifies the FP16 mode.

[WARNING] The module of llama is changing frequently. This script might not work
QNN_SDK_ROOT=/opt/qcom/aistack/qnn/2.20.0.240223
LD_LIBRARY_PATH=/opt/qcom/aistack/qnn/2.20.0.240223/lib/x86_64-linux-clang/:/opt/qcom/aistack/qnn/2.14.2.230905/lib/x86_64-linux-clang/:
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Initializing HtpProvider

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Performance Estimates unsupported

[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Arch 68 set by custom config is different from arch associated with SoC 57, will overwrite it to 75

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.linear.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Output[0] has incorrect Datatype 0x232.

[QNN Partitioner Op Support]: aten.rsqrt.default | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.mean.dim | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.linear.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.linear.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.sigmoid.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.linear.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Output[0] has incorrect Datatype 0x232.

[QNN Partitioner Op Support]: aten.rsqrt.default | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.mean.dim | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.linear.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.bmm.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.expand_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.expand_copy.default | True
[QNN Partitioner Op Support]: aten._softmax.default | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.bmm.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.expand_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.expand_copy.default | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.mul.Tensor | True
Traceback (most recent call last):
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/lei/executorch/examples/qualcomm/scripts/dummy_llama2.py", line 129, in <module>
    build_executorch_binary(
  File "/home/lei/executorch/examples/qualcomm/scripts/utils.py", line 215, in build_executorch_binary
    edge_prog.exported_program = to_backend(edge_prog.exported_program, qnn_partitioner)
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/lei/executorch/exir/backend/backend_api.py", line 363, in _
    partitioner_result = partitioner_instance(fake_edge_program)
  File "/home/lei/executorch/exir/backend/partitioner.py", line 64, in __call__
    return self.partition(exported_program)
  File "/home/lei/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 138, in partition
    partitions = self.generate_partitions(edge_program)
  File "/home/lei/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 124, in generate_partitions
    return generate_partitions_from_list_of_nodes(
  File "/home/lei/executorch/exir/backend/canonical_partitioners/pattern_op_partitioner.py", line 50, in generate_partitions_from_list_of_nodes
    partition_list = capability_partitioner.propose_partitions()
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 201, in propose_partitions
    if self.__is_node_supported(node) and node not in assignment:
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 79, in __is_node_supported
    self.operator_support.is_node_supported(dict(self.graph_module.named_modules()), node)
  File "/home/lei/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 72, in is_node_supported
    and self.node_visitors[node.target.__name__]
KeyError: 'aten.where.self'
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend

I also tried other PTQ options and only 8a8w mode can export the pte binary. Does this error message mean QNN does not currently support some of the operators in llama, especially when it comes to int16 and fp16?

cccclai commented 5 months ago

the aten.where.self op comes from decomposing F.scaled_dot_product_attention. In https://github.com/pytorch/executorch/pull/3037, it's replaced with a simler sdpa operator and it won't have the where op.