Open albanD opened 1 year ago
Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.
Hey,
There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.
It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.
I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:
aten::flip
aten::equal
aten::upsample_nearest1d.out
One vote for a CPU fallback for torch.bincount
.
Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)
Tip for everyone:
Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.
I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).
Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.
One missing op I ran into and haven't seen mentioned yet is aten::_unique2
.
Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1
when using the current main
branch build. However, instead I get warnings
The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
then
The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)
and finally the forward pass through my model crashes with
RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944
On cpu
it works fine. Could be #77886 I suppose.
Testing with some huggingface transformers code: + 1 vote for
aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.
+1
setting PYTORCH_ENABLE_MPS_FALLBACK=1
still results in:
NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].
CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]
@lhoenig could you open a new separate issue for the cpu fallback failing for you? The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround. We can continue this discussion in the new issue you will create.
@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).
Would like to add aten::_local_scalar_dense
to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.
@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.
I've got a non supported op: aten::grid_sampler_2d
envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)
Not supported
aten::l1_loss_backward.grad_input
aten::kl_div_backward
Code
X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()
Output
NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend
Trying to use affine crop from torchvision, and found the operator aten::linspace.out
does not seem to be implemented with the MPS backend
Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor
is not yet implemented.
Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.
Would be great to add aten::adaptive_max_pool2d
to the list - seems to be fairly common and for me useful in some point cloud architectures.
I ran into this error with aten::count_nonzero.dim_IntList
(via torch.count_nonzero()
). I'll take a look at implementing this op with MPS.
The operator aten::lgamma.out
is curently not yet implemented either.
NotImplementedError Traceback (most recent call last) Input In [2], in <cell line: 10>() 7 device = torch.device("mps") 9 # Create random input and output data ---> 10 x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype) 11 y = torch.sin(x) 13 # Randomly initialize weights
NotImplementedError: The operator 'aten::linspace.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Thank you !
I would like to add aten::linalg_householder_product
Using orthogonal parametrization with PYTORCH_ENABLE_MPS_FALLBACK=1
. I get:
Q = torch.linalg.householder_product(A, tau)
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/560148d7-a559-11ec-8c96-4add460b61a6/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)):
error: input types 'tensor<13x10xf32>' and 'tensor<1x10xi32>' are not broadcast compatible
Hi, please consider aten::avg_pool3d.out
.
The operator aten::erfinv.out
is not implemented.
The operator aten::logical_and.out
is not current implemented for the MPS device.
The operator aten::bitwise_and.Tensor_out
is not yet implemented for the MPS backend.
The operator 'aten::_slow_conv2d_forward' is not currently implemented for the MPS device.
Also found this:
NotImplementedError: Could not run 'aten::_copy_from_and_resize' with arguments from the 'CPU' backend. after enacting the PYTORCH_ENABLE_MPS_FALLBACK=1 env variable.
Got a message that aten::softplus.out
is not supported. I'd need that to update OpenPifPaf.
Would like to add
aten::_local_scalar_dense
to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.
You can use this as a guide: https://github.com/pytorch/pytorch/wiki/Adding-Op-for-MPS-Backend Please provide feedback if there is anything missing.
Would like to add
aten::_local_scalar_dense
to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.
MPS backend already has support for aten::_local_scalar_dense
(file https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/mps/operations/Scalar.mm). If you are still seeing the issue, could you please share the example you are trying to run?
The operator 'aten::_index_putimpl' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature,
'aten::_slow_conv2d_forward'
+1. How can I start contributing to have PyTorch implemented?
One vote for aten::_slow_conv2d_forward
since mmdet's ssd implementation relies on it.
aten::_slow_conv2d_forward
is not supported.
I am curious how the official benchmark on resnet-50 / vgg is measured? Any scripts or references?
![Uploading image.png…]()
'aten::softplus.out' is not supported. I was training a model with gpytorch and it showed up
I found that the operator aten::_ctc_loss
is not curently implemented for the MPS deivce either.
aten::index_add.out is not supported.
batch_y_len[batch_y_len<=0] = 1 NotImplementedError: The operator 'aten::_index_putimpl' is not current implemented for the MPS device.
I receive an error 'aten::normal is not implemented for the MPS device' after training a VAE models.
After using device MPS, it will report 'aten::cumsum.out' op is missing, so I set environment variable 'PYTORCH_ENABLE_MPS_FALLBACK', but it will report the next error for GPT-2 model:
/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py:999: UserWarning: The operator 'aten::cumsum.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
position_ids = attention_mask.long().cumsum(-1) - 1
Traceback (most recent call last):
File "/Users/lihua.llh/Documents/codes/lab/python/gpt2_demo/inferences/beam_generation.py", line 115, in <module>
main()
File "/Users/lihua.llh/Documents/codes/lab/python/gpt2_demo/inferences/beam_generation.py", line 102, in main
outputs = model.generate(input_ids=input_ids, num_beams=5, max_length=500, num_return_sequences=2,
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/generation_utils.py", line 1344, in generate
return self.beam_search(
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/generation_utils.py", line 2192, in beam_search
outputs = self(
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1046, in forward
transformer_outputs = self.transformer(
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 889, in forward
outputs = block(
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 390, in forward
attn_outputs = self.attn(
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 312, in forward
query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/pytorch_utils.py", line 107, in forward
x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
RuntimeError: tensors must be 2-D
@liulhdarks could you open a new issue to discuss this? It looks like the error is independent to the cumsum issue. Make sure to give details on the code you run, how to reproduce and if the code properly runs on CPU in the new issue!
Found that aten::view_as_complex
is not supported too. Using PYTORCH_ENABLE_MPS_FALLBACK=1
makes it possible to trigger a subsequent crash using slicing:
$ PYTORCH_ENABLE_MPS_FALLBACK=1 python3 -c 'import torch; print(torch.view_as_complex(torch.randn(1,4,2).to("mps"))[...,:-1,:])'
<string>:1: UserWarning: The operator 'aten::view_as_complex' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
<string>:1: UserWarning: 0The operator aten::view_as_complex appears to be a view operator, but it has no implementation for the backend "mps:0". View operators don't support falling back to run on the CPU, since the tensor's storage cannot be shared across devices. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/CPUFallback.cpp:175.)
libc++abi: terminating with uncaught exception of type c10::TypeError: Trying to convert ComplexFloat to the MPS backend but it does not have support for that dtype.
Exception raised from getMPSDataType at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/OperationUtils.mm:124 (most recent call first):
frame #0: at::native::mps::getMPSDataType(c10::ScalarType) + 452 (0x11142a080 in libtorch_cpu.dylib)
frame #1: invocation function for block in at::native::as_strided_tensorimpl_mps(at::Tensor const&, c10::ArrayRef<long long>, c10::ArrayRef<long long>, c10::optional<long long>) + 136 (0x111448e94 in libtorch_cpu.dylib)
frame #2: invocation function for block in at::native::mps::MPSGraphCache::CreateCachedGraph(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, at::native::mps::MPSCachedGraph* () block_pointer) + 216 (0x111437704 in libtorch_cpu.dylib)
@proger complex DTYPEs are not support for MPS at all right now I'm afraid. cc @kulinseth
@albanD any timeline for this ?
I ran into this finetuning mT5
NotImplementedError: The operator 'aten::_index_putimpl' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Tried setting the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
, because aten::cumsum.out
is currently not yet implemented, however I got the following error after setting the environment variable & trying to run a XGLM huggingface model:
/opt/homebrew/Caskroom/miniforge/base/envs/incoder-env/lib/python3.9/site-packages/transformers/models/xglm/modeling_xglm.py:155: UserWarning: The operator 'aten::cumsum.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:343: failed assertion `unsupported datatype for constant'
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
@okpatil4u PyTorch has more than 2000 different operators. So full support will definitely take quite a while. That's why we have this issue to help us prioritize which ones we're working on first.
Wow ! Thank you @albanD for the amazing work that you and your team put into Pytorch.
The operator 'aten::_slow_conv2d_forward' is not current implemented for the MPS device
Contributor
I have found that aten::slow_conv_transpose2d.out
is not implemented for the MPS device.
Code
device = torch.device('')
z = torch.randn(25, 100, 1, 1).to(device)
out = gen(z)
show_tensor_images(out, num_images=25)
show_tensor_images(real, num_images=25, title='Real Images')
Error Message
NotImplementedError: The operator 'aten::slow_conv_transpose2d.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Gated Linear Units are typically included in TabNet models; it would be great to add MPS support for aten::glu.out Getting an error message from Ludwig ML default TabNet:
File "/Users/anne/ludwig_env/lib/python3.9/site-packages/ludwig/modules/tabnet_modules.py", line 205, in forward
hidden = nn.functional.glu(hidden, dim=-1) # [bs, s]
File "/Users/anne/ludwig_env/lib/python3.9/site-packages/torch/nn/functional.py", line 1451, in glu
return torch._C._nn.glu(input, dim)
NotImplementedError: The operator 'aten::glu.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.
MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.
There are a very large number of operators in pytorch and so they are not all implemented yet for the MPS backends as it is still in the prototype phase. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.
If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on or that already has a PR associated with it.
Link to the wiki for details on how to add these ops and example PRs.
Good First Issue: Below is list of Ops which are good to get started to add operations to MPS backend. Please consider picking them up.
nn.Conv3D
aten::_weight_norm_interface
aten::max_unpool2d
aten::cummin.out
,aten::cummax.out
aten::upsample_linear1d.out
aten::lerp.Scalar_out
aten::renorm
Not categorized: These are the ops which are not yet picked up and need MPS implementation.
aten::slow_conv3d_forward
aten::_ctc_loss
aten::avg_pool3d.out
aten::linalg_qr.out
aten::multilabel_margin_loss_forward
aten::unique_dim
aten::_sample_dirichlet
aten::_fft_r2c
aten::upsample_bicubic2d.out
aten::linalg_inv_out_helper
aten::bucketize
aten::_embedding_bag
aten::_standard_gamma
aten::_upsample_bicubic2d_aa.out
aten::'aten::_symeig_helper
aten::linalg_matrix_exp
aten::_nested_tensor_from_mask
aten::randperm.generator_out
aten::_fused_sdp_choice
aten::linalg_cholesky_ex
aten::scatter_reduce.two_out
aten::kthvalue.values
aten::_linalg_solve_ex.result
aten::grid_sampler_2d_backward'
max_pool3d
(unfinished attempt https://github.com/pytorch/pytorch/pull/102148)WIP:
aten::kl_div_backward
(Is not needed )Implemented Ops: Ops that have MPS backend implementations.
See MPS operators coverage matrix and the readme for more details.
deprecated list
- [x] `aten::histc` #96652 - [x] `pow.Scalar_out` (@qqaatw ) - [x] `aten::log_sigmoid_forward` (@qqaatw ) - [x] `aten::fmax.out` (@qqaatw ) - [x] `aten::roll` https://github.com/pytorch/pytorch/pull/95168 - [x] `aten::hardsigmoid` (@qqaatw ) - [x] `aten::logit` (@qqaatw ) - [x] `linalg_solve_triangular` - [x] `aten::sort.values_stable` https://github.com/pytorch/pytorch/issues/86750 - [x] `aten::remainder.Tensor_out` https://github.com/pytorch/pytorch/issues/86806 - [x] `aten::hardswish` https://github.com/pytorch/pytorch/issues/86807 - [x] `aten::nansum` https://github.com/pytorch/pytorch/issues/86809 - [x] `aten::fmod.Tensor_out` https://github.com/pytorch/pytorch/issues/86810 - [x] `aten::range` https://github.com/pytorch/pytorch/issues/86990 - [x] `aten::argsort` https://github.com/pytorch/pytorch/issues/86991 - [x] `aten::repeat_interleave` https://github.com/pytorch/pytorch/issues/87219 - [x] `aten::median` https://github.com/pytorch/pytorch/issues/87220 - [x] `aten::trace` https://github.com/pytorch/pytorch/issues/87221 - [x] `aten::im2col` (Falling back to CPU as its mostly used in preprocessing layers) - [x] `aten::_cdist_forward` https://github.com/pytorch/pytorch/pull/91643 - [x] `aten::native_group_norm_backward` (Implemented by @malfet ) - [x] `aten::grid_sampler_2d` (https://github.com/pytorch/pytorch/pull/94273) - [x] `aten::upsample_nearest1d_backward.grad_input` - [x] `aten::upsample_nearest1d.out` - [x] `aten::repeat_interleave.self_int` - [x] `aten::nan_to_num.out` - [x] `aten::unique_consecutive` https://github.com/pytorch/pytorch/pull/88532 - [x] `torch.bincount` https://github.com/pytorch/pytorch/pull/91267 - [x] `aten::_unique2` https://github.com/pytorch/pytorch/pull/88532 - [x] `aten::unfold` https://github.com/pytorch/pytorch/pull/91266 - [x] `aten::triangular_solve.X` https://github.com/pytorch/pytorch/pull/94345 - [x] `aten::nonzero` https://github.com/pytorch/pytorch/pull/91616 - [x] `aten::_index_put_impl_` (https://github.com/pytorch/pytorch/pull/85672) - [x] `aten::amax.out` (#79682) - [X] `aten::_slow_conv2d_forward` (https://github.com/pytorch/pytorch/pull/86303) - [x] `aten::eye.m_out` (https://github.com/pytorch/pytorch/pull/78408) - [x] `aten::multinomial` (https://github.com/pytorch/pytorch/pull/80760 ) - [x] `aten::flip` (#80214) - [x] `aten::equal` https://github.com/pytorch/pytorch/pull/80195 - [x] `aten::_local_scalar_dense` - [x] `aten::l1_loss_backward.grad_input` (#80010) - [x] `aten::glu.out` (#79866) - [x] ` aten::linspace.out` https://github.com/pytorch/pytorch/pull/78570 - [x] `aten::arange.out` https://github.com/pytorch/pytorch/pull/78789 - [x] `aten::adaptive_max_pool2d` https://github.com/pytorch/pytorch/pull/78410 - [x] `aten::count_nonzero.dim_IntList` - [x] `aten::softplus.out` (https://github.com/pytorch/pytorch/pull/78930) - [x] `aten::index_add.out` https://github.com/pytorch/pytorch/pull/79935 - [x] `aten::normal` (#80297) - [x] `aten::native_layer_norm_backward` https://github.com/pytorch/pytorch/pull/79189 - [x] `aten::logical_and.out` (#80216) - [x] `aten::frac.out` (https://github.com/pytorch/pytorch/pull/86625) - [x] `aten:: masked_select` https://github.com/pytorch/pytorch/pull/85818 - [x] `aten::softplus_backward.grad_input` (#79873) - [x] `aten::slow_conv_transpose2d.out` (@malfet could be due to incompatibility with torchvision) - [x] `aten::signbit.out` (https://github.com/pytorch/pytorch/pull/87214) - [X] `aten::cumsum.out` (https://github.com/pytorch/pytorch/pull/88319) - [X] `aten::cumprod.out` - [X] `aten::expm1.out` (https://github.com/pytorch/pytorch/pull/87147) - [x] `aten::bitwise_xor.Tensor_out` (https://github.com/pytorch/pytorch/pull/82307) - [x] `aten::bitwise_and.Tensor_out` (https://github.com/pytorch/pytorch/pull/82307) - [x] `aten::bitwise_or.Tensor_out` (https://github.com/pytorch/pytorch/pull/82307) - [x] `aten::index.Tensor` (https://github.com/pytorch/pytorch/pull/82507) - [x] `aten::index.Tensor_out` (https://github.com/pytorch/pytorch/pull/82507)Ops not supported by MPS: Ops that will require either to use the CPU fallback system or a custom Metal kernel.
aten::lgamma.out
aten::linalg_householder_product