pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
1.39k stars 228 forks source link

[CoreML Backend] Update coreml runner to only profile model when prof… #2574

Closed cymbalrush closed 2 months ago

cymbalrush commented 2 months ago

The CoreML executor would profile the model by default, this causes an issue when the installed sdk < 14.4. Model execution fails as profiling is only available for >= 14.4 and on older sdk it returns an error.

This addresses the issue by fixing the default option and only profiling when profiling_model option is set.

pytorch-bot[bot] commented 2 months ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at

Note: Links to docs will display an error until the docs builds have been completed.

:x: 13 New Failures, 31 Unrelated Failures

As of commit 6990acd7a7876ee7b044136589cbb0a4abcce975 with merge base 3152d7f3311465ec915a6afb21232c411b463dc2 (image):

NEW FAILURES - The following jobs have failed:

* [Apple / upload-frameworks-ios]( ([gh]( `Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers` * [pull / test-llama-runner-linux (fp16, cmake) / linux-job]( ([gh]( `RuntimeError: Command docker exec -t 26fe25d3bc8524610ffdbad104e0c4bb789b20b3c02c51bf461b76edbf93c99b /exec failed with exit code 2` * [pull / test-llama-runner-linux (fp32, cmake) / linux-job]( ([gh]( `RuntimeError: Command docker exec -t d5b7912d75fed7c2d063b643aea64689952436adcf5905f6dbf1e6c81bad7f06 /exec failed with exit code 2` * [trunk / test-models-macos (cmake, add, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, dl3, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, edsr, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, emformer_join, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, emformer_predict, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, emformer_transcribe, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, ic3, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, llama2, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, mul, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, mv3, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1`

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

* [Apple / test-demo-ios / macos-job]( ([gh]( `RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1` * [pull / unittest / macos (buck2) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-coreml-delegate / macos-job]( ([gh]( `RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1` * [trunk / test-custom-ops-macos (cmake) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, add_mul, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, add_mul, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, add, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, dl3, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, edsr, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, ic3, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, ic4, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, ic4, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, linear, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, linear, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, llama2, xnnpack-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, mobilebert, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, mobilebert, xnnpack-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, mv2_untrained, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, mv2, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, mv2, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, mv3, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, resnet18, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, resnet18, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, resnet50, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, resnet50, xnnpack-quantization-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, softmax, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, vit, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, vit, xnnpack-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-models-macos (cmake, w2l, portable, macos-m1-stable, 90) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [trunk / test-models-macos (cmake, w2l, xnnpack-delegation, macos-m1-stable, 90) / macos-job]( ([gh]( `Delegation fp32 error` * [trunk / test-selective-build-macos (cmake) / macos-job]( ([gh]( `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cymbalrush commented 2 months ago

@pytorchbot label ciflow/trunk

pytorch-bot[bot] commented 2 months ago

Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help.

cymbalrush commented 2 months ago

@guangy10 I am unable to add ciflow/trunk label, any suggestions?

guangy10 commented 2 months ago

@guangy10 I am unable to add ciflow/trunk label, any suggestions?

@cymbalrush I'm a bit surprised that you were not added to the repo as a contributor. Just added you in. Can you try it again?

cymbalrush commented 2 months ago

@pytorchbot label ciflow/trunk

pytorch-bot[bot] commented 2 months ago

Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help.

cymbalrush commented 2 months ago

@guangy10 still the same issue

cymbalrush commented 2 months ago

@pytorchbot label ciflow/trunk

pytorch-bot[bot] commented 2 months ago

Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help.

guangy10 commented 2 months ago

Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help.

@cymbalrush will add that tag to unblock. At meanwhile @huydhn can you take a look?

huydhn commented 2 months ago

We have tighten the security when it comes to running CI jobs recently, if you don't have write access to the repo, the CI will not be run by default. You will need to ask the reviewer to approve CI run explicitly, i.e.

Screenshot 2024-03-21 at 18 48 56

Once this is approved, then attaching ciflow/trunk can be added to start trunk jobs. Note that updating the PR by pushing a new commit will invalidate the approval and a new approval is needed.

cc @guangy10 For regular contributors to ET that have a proven record, you could consider granting them write access to the repo, then they will not need to ask for CI approval.

guangy10 commented 2 months ago

We have tighten the security when it comes to running CI jobs recently, if you don't have write access to the repo, the CI will not be run by default. You will need to ask the reviewer to approve CI run explicitly, i.e.

@huydhn Oh, this is sad as relying on pytorchbot to kick off trunk/periodic jobs will be meaningless. That doesn't sound a good move, especially for ET which is not in GH1st.

guangy10 commented 2 months ago

@cymbalrush FYI, I granted you the write permission to unblock.

cymbalrush commented 2 months ago

Thanks @guangy10 @huydhn !

cymbalrush commented 2 months ago

I am seeing failures unrelated to the PR, the changes in the PR just touch the CoreML runner, the failure is happening at

packages/executorch/exir/passes/", line 45, in from executorch.exir.passes.quant_fusion_pass import QuantFusionPass File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/executorch/exir/passes/", line 14, in from ._quant_patterns_and_replacements import get_quant_patterns_and_replacements File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/executorch/exir/passes/", line 17, in from torchao.quantization.quant_primitives import quantized_decomposed_lib File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 7, in from .smoothquant import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 17, in import torchao.quantization.quant_api as quant_api File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 25, in from .dynamic_quant import DynamicallyPerAxisQuantizedLinear File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 9, in from .quant_primitives import ( File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 19, in from torchao.kernel.intmm_triton import int_scaled_matmul File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/kernel/", line 6, in import triton

cymbalrush commented 2 months ago

@pytorchbot label ciflow/trunk

guangy10 commented 2 months ago

I am seeing failures unrelated to the PR, the changes in the PR just touch the CoreML runner, the failure is happening at

packages/executorch/exir/passes/", line 45, in from executorch.exir.passes.quant_fusion_pass import QuantFusionPass File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/executorch/exir/passes/", line 14, in from ._quant_patterns_and_replacements import get_quant_patterns_and_replacements File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/executorch/exir/passes/", line 17, in from torchao.quantization.quant_primitives import quantized_decomposed_lib File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 7, in from .smoothquant import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 17, in import torchao.quantization.quant_api as quant_api File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 25, in from .dynamic_quant import DynamicallyPerAxisQuantizedLinear File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 9, in from .quant_primitives import ( File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/quantization/", line 19, in from torchao.kernel.intmm_triton import int_scaled_matmul File "/Users/runner/work/_temp/conda_environment_8384223477/lib/python3.11/site-packages/torchao/kernel/", line 6, in import triton

@cymbalrush That's irrelevant and it's an known issue that the team is proactively working on. Maybe rebase and test off the stable branch (git checkout viable/strict), however, the stable branch is behind 1 week so I'm not sure it would be helpful. Or if this is a small fix and you are confident about it, we can merge it w/o waiting for the triton dep issue being resolved

facebook-github-bot commented 2 months ago

@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 months ago

@shoumikhin merged this pull request in pytorch/executorch@8532e79a6110857469e7cba2112dda729bb35490.