pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.7k stars 290 forks source link

[Llama] [Dynamic Shape] [Core ML Delegate] Do Not Delegate Symbol Manipulation #4658

Open YifanShenSZ opened 1 month ago

YifanShenSZ commented 1 month ago

When export dynamic-shape llama2, in the attached piece of the partitioned model, there are symbol manipulations such as

add: "Sym(s0 + u156)" = _local_scalar_dense + sym_size
...
le_1: "Sym(s0 + u156 <= 128)" = add <= 128

Which came from dynamic shape check assertions in the original model

add: "Sym(s0 + u156)" = _local_scalar_dense + sym_size
le_1: "Sym(s0 + u156 <= 128)" = add <= 128
_assert_scalar_2 = torch.ops.aten._assert_scalar.default(le_1, "Runtime assertion failed for expression s0 + u0 <= 128 on node 'le_27'");  le_1 = None

This leads to 2 problems:

  1. Core ML does not support symbol itself as model io
  2. torch.ops.aten._assert_scalar.default is scattered all over the program, which is not supported in Core ML so may lead to undesirable graph breaks

Imho:

  1. These dynamic shape checks are very small, so doing them in executorch runtime may be more efficient than paying delegate overhead
  2. All assertions may be performed at the beginning of the program in executorch runtime, so the rest heavy computation can be delegated to Core ML as a whole graph
cccclai commented 1 month ago

Discuss a bit, there are two action items

  1. Convert symint etc to tensor, then they can be consumed by coreml IR.
  2. Remove the assert in the model definition, this likely be removed for the batch prefill branch in llama transformer.
YifanShenSZ commented 3 weeks ago

cc @kimishpatel

kimishpatel commented 3 weeks ago

Convert symint etc to tensor, then they can be consumed by coreml IR.

@YifanShenSZ do you know what is this intended for?

YifanShenSZ commented 3 weeks ago

I encountered this issue when trying to convert dynamic shape stories 110M to Core ML. These symbols & assertions seem shape range guard to me

angelayi commented 3 weeks ago

torch.ops.aten._assert_scalar.default is scattered all over the program, which is not supported in Core ML so may lead to undesirable graph breaks

I don't have a lot of context on this, but it seems like another solution could be to consume the asserts to prevent the undesirable graph breaks, and then within your preprocess call, call RemoveGraphAssertsPass to remove these asserts, which will also call DCE and remove the unnecessary add/le/ge computations.

YifanShenSZ commented 3 weeks ago

consume the asserts to prevent the undesirable graph breaks, and then within your preprocess call, call RemoveGraphAssertsPass to remove these asserts

Would it be dangerous to have delegate silently behave differently from the original program?

angelayi commented 3 weeks ago

Would it be dangerous to have delegate silently behave differently from the original program?

These are just asserts so... I think it doesn't matter if you remove it? It's just a correctness check. ExecuTorch calls the same pass to remove asserts right before lowering to ExecuTorch runtime.

cccclai commented 3 weeks ago

It's probably better to have coreml consume these assert ops.

For llama specifically, are those check from the separate branch that are only for batch prefill? If yes, can we just don't use .item and unify the branches @angelayi @kimishpatel

angelayi commented 3 weeks ago

There's also a .item call here, but I'm not sure how you would prevent using the .item call if you want a dynamic input position.

cccclai commented 3 weeks ago

@angelayi yeah I meant removing those .item call (like removing the if statement if self.enable_dynamic_shape), and just use else statement these lines. @kimishpatel added the lines in if self.enable_dynamic_shape because he ran into export error when trying to export with dynamic shape, but I think we can export with dynamic shape directly with the original code directly

YifanShenSZ commented 3 weeks ago

consume the asserts to prevent the undesirable graph breaks, and then within your preprocess call, call RemoveGraphAssertsPass to remove these asserts

It's probably better to have coreml consume these assert ops.

I see, I'll consume them in coreml and make them noops

YifanShenSZ commented 2 weeks ago

Consume all assertions and make them noops can successfully delegate the entire program to coreml 👏

However, the delegated edge program manager cannot be created, due to some dynamic shape guard

Running MIL default pipeline: 100%|████████████████████████████████████████| 86/86 [00:18<00:00,  4.76 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████| 12/12 [00:00<00:00, 57.85 passes/s]
W0829 12:33:30.318000 58958 torch/fx/experimental/symbolic_shapes.py:5131] failed during evaluate_expr(s0 + u169 > 128, hint=None, expect_rational=True, size_oblivious=True, forcing_spec=False
E0829 12:33:30.318000 58958 torch/fx/experimental/recording.py:298] failed while running evaluate_expr(*(s0 + u169 > 128, None), **{'fx_node': False, 'size_oblivious': True})
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] failed while attempting to run meta for aten.slice_copy.Tensor
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] Traceback (most recent call last):
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2023, in _dispatch_impl
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     r = func(*args, **kwargs)
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/_ops.py", line 713, in __call__
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     return self._op(*args, **kwargs)
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/_decomp/decompositions.py", line 781, in slice_forward
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     elif statically_known_true(end_val == sys.maxsize) or guard_size_oblivious(
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 253, in guard_size_oblivious
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     return expr.node.guard_size_oblivious("", 0)
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/fx/experimental/sym_node.py", line 503, in guard_size_oblivious
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     r = self.shape_env.evaluate_expr(
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/fx/experimental/recording.py", line 262, in wrapper
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     return retlog(fn(*args, **kwargs))
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 5129, in evaluate_expr
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     return self._evaluate_expr(orig_expr, hint, fx_node, expect_rational, size_oblivious, forcing_spec=forcing_spec)
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 5247, in _evaluate_expr
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     raise self._make_data_dependent_error(
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression u169 + 3 > 128 (unhinted: s0 + u169 > 128).  (Size-like symbols: none)
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] 
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] Potential framework code culprit (scroll up for full backtrace):
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]   File "/Volumes/Models/LLM/Framework/CoreMLTools-Dev_ExecuTorch-Dev_2024-08-28/envs/llama-py310/lib/python3.10/site-packages/torch/_decomp/decompositions.py", line 781, in slice_forward
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027]     elif statically_known_true(end_val == sys.maxsize) or guard_size_oblivious(
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] 
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] For more information, run with TORCH_LOGS="dynamic"
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u169"
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] 
E0829 12:33:30.321000 58958 torch/_subclasses/fake_tensor.py:2027] For C++ stack trace, run with TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
angelayi commented 2 weeks ago

Do you happen to have a better stack trace of where this is being called? It's definitely because the asserts are being removed. After removing the asserts, you cannot run the model with fake tensors anymore, which happens when you re-export or run decompositions. If this is within the delegate, you can make a copy of the program so that removing the asserts do not affect the model when running eagerly.

YifanShenSZ commented 2 weeks ago

Do you happen to have a better stack trace of where this is being called?

Sure, attached stack-trace.txt, which contains everything after conversion to Core ML

If this is within the delegate, you can make a copy of the program so that removing the asserts do not affect the model when running eagerly.

IIUC, after delegate, it is the edge manager that replaces the delegated graph with a "call delegate" node, so should the "copy" happen in edge manager?

Inside delegate, given an exported program, we only output a delegate model that is equivalent to the input exported program