pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
2.22k stars 371 forks source link

Exporting to ExecuTorch using Xnnpack backend hangs during EdgeProgramManager.tobackend(XnnpackPartitioner()) #6191

Open virginia-cangelosi opened 1 month ago

virginia-cangelosi commented 1 month ago

🐛 Describe the bug

When exporting a PyTorch model to ExecuTorch the following conversion script hangs on the line edge_program = edge_program.to_backend(XnnpackPartitioner())

`import torch from torch.export import Dim import logging from torch.export import export from executorch.exir import to_edge from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner from lib.nets import CascadedNet

model = CascadedNet(2048, 1024, 32, 128)

toload = '/Users/**/vocal-remover/models/baseline.pth' model.load_state_dict(torch.load(toload, map_location=torch.device('cpu')))

with torch.no_grad(): model.eval()

dummy_input = torch.randn(4, 2, 1025, 256)

torch._dynamo.config.verbose=True torch._logging.set_logs(dynamo = logging.INFO) aten_dialect = export(model, (dummy_input,))

edge_program = to_edge(aten_dialect)

edge_program = edge_program.to_backend(XnnpackPartitioner())

executorch_program = edge_program.to_executorch()

with open("vocal-remover.pte", "wb") as file: file.write(executorch_program.buffer)`

I have let it run for over 3 hours yet nothing new appears to be progressing. I have included the last few lines of the terminal output as well as what is reported when the script is killed usice ctrl C. Note that I was able to create a .pte successfully with the exact script excluding the .to_backend line.

I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] Dynamo captured graph: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] class GraphModule(torch.nn.Module): I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] def forward(self, L_q_: "f16[8, 16, 32, 64]", L_k_: "f16[8, 16, 32, 64]", L_v_: "f16[8, 16, 32, 64]", L_attn_mask_: "f16[32, 32]"): I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_q_ = L_q_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_k_ = L_k_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_v_ = L_v_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_attn_mask_ = L_attn_mask_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] # File: /Users/*******/venv10/lib/python3.10/site-packages/executorch/backends/xnnpack/partition/graphs/sdpa.py:37 in forward, code: return torch.nn.functional.scaled_dot_product_attention( I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] scaled_dot_product_attention: "f16[8, 16, 32, 64]" = torch._C._nn.scaled_dot_product_attention(l_q_, l_k_, l_v_, attn_mask = l_attn_mask_, dropout_p = 0.0, is_causal = False, scale = None); l_q_ = l_k_ = l_v_ = None I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] return (l_attn_mask_, scaled_dot_product_attention) I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434]

I1014 08:57:44.785000 8466665280 torch/_dynamo/logging.py:56] [6/0] Step 1: torchdynamo start tracing forward /Users/*******/venv10/lib/python3.10/site-packages/executorch/backends/xnnpack/partition/graphs/sdpa.py:30 I1014 08:57:44.789000 8466665280 torch/_dynamo/logging.py:56] [6/0] Step 1: torchdynamo done tracing forward (RETURN_VALUE) I1014 08:57:44.789000 8466665280 torch/_dynamo/logging.py:56] [6/0] Step 2: calling compiler function dynamo_normalization_capturing_compiler I1014 08:57:44.789000 8466665280 torch/_dynamo/logging.py:56] [6/0] Step 2: done compiler function dynamo_normalization_capturing_compiler I1014 08:57:44.791000 8466665280 torch/fx/experimental/symbolic_shapes.py:3639] [6/0] produce_guards I1014 08:57:44.893000 8466665280 torch/fx/experimental/symbolic_shapes.py:3549] create_symbol s0 = 8 for __meta_utils_unknown_tensor33.size()[0] [2, 9223372036854775806] (utils/_pytree.py:787 in unflatten), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s0" I1014 08:57:44.894000 8466665280 torch/fx/experimental/symbolic_shapes.py:3549] create_symbol s1 = 16 for __meta_utils_unknown_tensor33.size()[1] [2, 9223372036854775806] (utils/_pytree.py:787 in unflatten), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s1" I1014 08:57:44.895000 8466665280 torch/fx/experimental/symbolic_shapes.py:3549] create_symbol s2 = 32 for __meta_utils_unknown_tensor33.size()[2] [2, 9223372036854775806] (utils/_pytree.py:787 in unflatten), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s2" I1014 08:57:44.896000 8466665280 torch/fx/experimental/symbolic_shapes.py:3549] create_symbol s3 = 64 for __meta_utils_unknown_tensor33.size()[3] [2, 9223372036854775806] (utils/_pytree.py:787 in unflatten), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s3" I1014 08:57:44.905000 8466665280 torch/fx/experimental/symbolic_shapes.py:4831] set_replacement s2 = 32 (solve) VR[32, 32] I1014 08:57:44.905000 8466665280 torch/fx/experimental/symbolic_shapes.py:5082] eval Eq(32, s2) [guard added] (<eval_with_key>.276:9 in forward), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Eq(32, s2)" I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] Summary of dimension constraints: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] The following dimensions have been specialized and CANNOT be dynamic. I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] ``` I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] def specializations(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, attn_mask: Optional[torch.Tensor] = None): I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] # q: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert q.size()[0] == 8 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert q.size()[1] == 16 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert q.size()[2] == 32 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert q.size()[3] == 64 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] # k: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert k.size()[0] == 8 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert k.size()[1] == 16 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert k.size()[2] == 32 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert k.size()[3] == 64 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] # v: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert v.size()[0] == 8 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert v.size()[1] == 16 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert v.size()[2] == 32 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert v.size()[3] == 64 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] # attn_mask: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert attn_mask.size()[0] == 32 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] assert attn_mask.size()[1] == 32 I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] ``` I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1410] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] Dynamo captured graph: I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] class GraphModule(torch.nn.Module): I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] def forward(self, L_q_: "f16[8, 16, 32, 64]", L_k_: "f16[8, 16, 32, 64]", L_v_: "f16[8, 16, 32, 64]", L_attn_mask_: "f16[32, 32]"): I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_q_ = L_q_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_k_ = L_k_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_v_ = L_v_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] l_attn_mask_ = L_attn_mask_ I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] # File: /Users/*******/venv10/lib/python3.10/site-packages/executorch/backends/xnnpack/partition/graphs/sdpa.py:37 in forward, code: return torch.nn.functional.scaled_dot_product_attention( I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] scaled_dot_product_attention: "f16[8, 16, 32, 64]" = torch._C._nn.scaled_dot_product_attention(l_q_, l_k_, l_v_, attn_mask = l_attn_mask_, dropout_p = 0.0, is_causal = False, scale = None); l_q_ = l_k_ = l_v_ = None I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] return (l_attn_mask_, scaled_dot_product_attention) I1014 08:57:44.909000 8466665280 torch/_dynamo/eval_frame.py:1434] ^CTraceback (most recent call last): File "/Users/*******/en-ch-songtranslator/egs2/opencpop/svs1/../../../../executorch/create_pte_vocal-remover.py", line 38, in <module> edge_program = edge_program.to_backend(XnnpackPartitioner()) File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1166, in to_backend new_edge_programs[name] = to_backend(program, partitioner) File "/opt/homebrew/Cellar/python@3.10/3.10.15/Frameworks/Python.framework/Versions/3.10/lib/python3.10/functools.py", line 889, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 363, in _ partitioner_result = partitioner_instance(fake_edge_program) File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/exir/backend/partitioner.py", line 66, in __call__ return self.partition(exported_program) File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 1148, in partition ret: PartitionResult = self._partition(exported_program, self.quant) File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 1139, in _partition partitions = self.generate_partitions(exported_program, quant) File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 1104, in generate_partitions return generate_partitions_from_list_of_nodes( File "/Users/*******/venv10/lib/python3.10/site-packages/executorch/exir/backend/canonical_partitioners/pattern_op_partitioner.py", line 50, in generate_partitions_from_list_of_nodes partition_list = capability_partitioner.propose_partitions() File "/Users/*******/venv10/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 218, in propose_partitions maybe_merge_partition(self_id, other_id) File "/Users/*******/venv10/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 138, in maybe_merge_partition if dfs_iter_find_cycle(all_user_nodes): File "/Users/*******/venv10/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 109, in dfs_iter_find_cycle if path_node in merged_nodes: KeyboardInterrupt I1014 11:55:59.376000 8466665280 torch/_dynamo/utils.py:335] TorchDynamo compilation metrics: I1014 11:55:59.376000 8466665280 torch/_dynamo/utils.py:335] Function Runtimes (s) I1014 11:55:59.376000 8466665280 torch/_dynamo/utils.py:335] ------------------------------- -------------- I1014 11:55:59.376000 8466665280 torch/_dynamo/utils.py:335] _compile.<locals>.compile_inner 3.7328 I1014 11:55:59.376000 8466665280 torch/_dynamo/utils.py:335] OutputGraph.call_user_compiler 0.001 I1014 11:55:59.376000 8466665280 torch/_dynamo/utils.py:335] create_aot_dispatcher_function 26.2178

Versions

Collecting environment information... PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.0.40.1) CMake version: version 3.30.3 Libc version: N/A

Python version: 3.10.15 (main, Sep 7 2024, 00:20:06) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime) Python platform: macOS-14.6.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M2 Pro

Versions of relevant libraries: [pip3] audiolm-pytorch==1.1.4 [pip3] ema-pytorch==0.5.1 [pip3] executorch==0.3.0a0+7d77d78 [pip3] lion-pytorch==0.2.2 [pip3] numpy==1.23.5 [pip3] onnxruntime==1.18.1 [pip3] optree==0.12.1 [pip3] pytorch-wpe==0.0.1 [pip3] torch==2.4.0 [pip3] torch-complex==0.4.4 [pip3] torchaudio==2.4.0 [pip3] torchsr==1.0.4 [pip3] torchtext==0.18.0 [pip3] torchvision==0.19.0 [pip3] vector-quantize-pytorch==1.14.26 [conda] Could not collect

virginia-cangelosi commented 1 month ago

Hello, I was wondering if there were any updates on this and any sort of work around I can try

virginia-cangelosi commented 2 weeks ago

I have resolved this issue by using a PC with more memory as it required 27GB. Would it be possible to incorporate an error message into the conversion to cause it to fail when errors like this occur. Much better than it hanging forever.