Mutable buffer fails to lower to QNN backend

leigao97 commented 3 weeks ago

When I lower this toy example to QNN backend, the linear operator cause an error.

    class MutableStateModule(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.register_buffer("state", torch.tensor(torch.zeros(1,1)))

        def forward(self, x):
            self.state.add_(torch.ones(1,1))
            return x @ self.state.T

Here is message:

Traceback (most recent call last):
  File "/home/lei/llama_mobile/python/android/test2.py", line 89, in <module>
    build_executorch_binary(
  File "/home/lei/llama_mobile/python/android/utils.py", line 215, in build_executorch_binary
    edge_prog.exported_program = to_backend(edge_prog.exported_program, qnn_partitioner)
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/lei/llama_mobile/executorch/exir/backend/backend_api.py", line 363, in _
    partitioner_result = partitioner_instance(fake_edge_program)
  File "/home/lei/llama_mobile/executorch/exir/backend/partitioner.py", line 64, in __call__
    return self.partition(exported_program)
  File "/home/lei/llama_mobile/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 138, in partition
    partitions = self.generate_partitions(edge_program)
  File "/home/lei/llama_mobile/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 124, in generate_partitions
    return generate_partitions_from_list_of_nodes(
  File "/home/lei/llama_mobile/executorch/exir/backend/canonical_partitioners/pattern_op_partitioner.py", line 50, in generate_partitions_from_list_of_nodes
    partition_list = capability_partitioner.propose_partitions()
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 201, in propose_partitions
    if self.__is_node_supported(node) and node not in assignment:
  File "/home/lei/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 79, in __is_node_supported
    self.operator_support.is_node_supported(dict(self.graph_module.named_modules()), node)
  File "/home/lei/llama_mobile/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 79, in is_node_supported
    op_wrapper = self.node_visitors[node.target.__name__].define_node(
  File "/home/lei/llama_mobile/executorch/backends/qualcomm/builders/op_linear.py", line 52, in define_node
    weight_tensor_wrapper = self.define_tensor(
  File "/home/lei/llama_mobile/executorch/backends/qualcomm/builders/node_visitor.py", line 301, in define_tensor
    dims = [1] if len(tensor.size()) == 0 else tensor.size()
AttributeError: 'NoneType' object has no attribute 'size'

If I replace linear operation x @ self.state.T with addtion operation x + self.state.T, it will work.

Jack-Khuu commented 3 weeks ago

@cccclai For QNN Lowering

cccclai commented 3 weeks ago

what does the graph look like for this toy model?

leigao97 commented 3 weeks ago

The graph looks like this:

def forward(self, b_state, x):
    aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([1, 1], 1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
    aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(b_state, aten_full_default);  b_state = aten_full_default = None
    aten_linear_default = executorch_exir_dialects_edge__ops_aten_linear_default(x, aten_add_tensor);  x = None
    return (aten_add_tensor, aten_linear_default)

cccclai commented 3 weeks ago

Is it after torch.export or to_edge? Mind sharing the repro script?

leigao97 commented 3 weeks ago

It looks like the pre-autograd ATen dialect graph. Here is my script:

import torch

from executorch.backends.qualcomm.partition.qnn_partitioner import QnnPartitioner
from executorch.backends.qualcomm.utils.utils import (
    capture_program,
    generate_htp_compiler_spec,
    generate_qnn_executorch_compiler_spec,
)
from executorch.backends.qualcomm.serialization.qnn_compile_spec_schema import (
    QcomChipset,
)
from executorch.exir.backend.backend_api import to_backend

class MutableStateModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer("state", torch.tensor(torch.zeros(1,1)))

    def forward(self, x):
        self.state.add_(torch.ones(1,1))
        return x @ self.state.T

model = MutableStateModule()

inputs = (torch.zeros(1,1),)

edge_prog = capture_program(model, inputs)

qnn_partitioner = QnnPartitioner(
    generate_qnn_executorch_compiler_spec(
        soc_model=QcomChipset.SM8650,
        backend_options=generate_htp_compiler_spec(use_fp16=True),
        debug=False,
        saver=False,
        shared_buffer=False,
    ),
)
edge_prog.exported_program = to_backend(edge_prog.exported_program, qnn_partitioner)

From my side, I need to set environment variables like $EXECUTORCH_ROOT and PYTHONPATH before running the script. Here is the reference: https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html#setting-up-your-developer-environment

leigao97 commented 2 weeks ago

Are there any updates on this issue? Thanks!

cccclai commented 2 weeks ago

Not yet - I think it's similar to this issue: https://github.com/pytorch/executorch/issues/4042

leigao97 commented 1 week ago

The graph looks like this:

def forward(self, b_state, x):
    aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([1, 1], 1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
    aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(b_state, aten_full_default);  b_state = aten_full_default = None
    aten_linear_default = executorch_exir_dialects_edge__ops_aten_linear_default(x, aten_add_tensor);  x = None
    return (aten_add_tensor, aten_linear_default)

As shown in the graph, the linear op takes the output from the add op as its weight tensor. In the code below, the weight_node is aten_add_tensor and it is None. https://github.com/pytorch/executorch/blob/5584b9e3c865edca239ec5df6346f1d1aabb0276/backends/qualcomm/builders/op_linear.py#L51 And the weight_tensor_wrapper needs to get the value of the weight tensor, which causes the error.

I replaced the get_parameter function with the get_tensor function and it seems to work.

weight_tensor = self.get_tensor(weight_node, node)

Is this an okay bypass? Thank you.

cccclai commented 1 week ago

Hmm does it work on runtime? I sort of doubt it...

cccclai commented 23 hours ago

@leigao97 hey just would like to follow up on this, are you still blocked on the issue?

leigao97 commented 23 hours ago

Yes, the modification above was not correct. The reason why I encounter this issue is that I would like to run int8 weight only quantized model on QNN backend by following this procedure:

https://github.com/pytorch/executorch/blob/b448254a88edc6c30d8926abcebf5a1871d675cf/examples/models/llama2/source_transformation/quantize.py#L355

I found the root reason is that if we perform any operation on the buffer, then there will be an operator, and the output of that operator doesn't have a parameter value, so the linear_op will fail. In the quantized linear forward function above, the weight buffer is cast to float point, which also can cause this issue.

For now, I am using XNN backend instead.

pytorch / executorch

Mutable buffer fails to lower to QNN backend #4075