General MPS op coverage tracking issue

albanD commented 1 year ago

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.

There are a very large number of operators in pytorch and so they are not all implemented yet for the MPS backends as it is still in the prototype phase. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.

If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on or that already has a PR associated with it.

Link to the wiki for details on how to add these ops and example PRs.

Good First Issue: Below is list of Ops which are good to get started to add operations to MPS backend. Please consider picking them up.

[ ] nn.Conv3D
[x] aten::_weight_norm_interface
[ ] aten::max_unpool2d
[ ] aten::cummin.out, aten::cummax.out
[ ] aten::upsample_linear1d.out
[x] aten::lerp.Scalar_out
[x] aten::renorm

Not categorized: These are the ops which are not yet picked up and need MPS implementation.

[ ] aten::slow_conv3d_forward
[ ] aten::_ctc_loss
[ ] aten::avg_pool3d.out
[ ] aten::linalg_qr.out
[ ] aten::multilabel_margin_loss_forward
[ ] aten::unique_dim
[ ] aten::_sample_dirichlet
[x] aten::_fft_r2c
[ ] aten::upsample_bicubic2d.out
[ ] aten::linalg_inv_out_helper
[x] aten::bucketize
[ ] aten::_embedding_bag
[ ] aten::_standard_gamma
[ ] aten::_upsample_bicubic2d_aa.out
[ ] aten::'aten::_symeig_helper
[ ] aten::linalg_matrix_exp
[ ] aten::_nested_tensor_from_mask
[x] aten::randperm.generator_out
[ ] aten::_fused_sdp_choice
[ ] aten::linalg_cholesky_ex
[ ] aten::scatter_reduce.two_out
[ ] aten::kthvalue.values
[ ] aten::_linalg_solve_ex.result
[ ] aten::grid_sampler_2d_backward'
[ ] max_pool3d (unfinished attempt https://github.com/pytorch/pytorch/pull/102148)

WIP:

[ ] aten::kl_div_backward (Is not needed )

Implemented Ops: Ops that have MPS backend implementations.

See MPS operators coverage matrix and the readme for more details.

deprecated list

- [x] `aten::histc` #96652 - [x] `pow.Scalar_out` (@qqaatw ) - [x] `aten::log_sigmoid_forward` (@qqaatw ) - [x] `aten::fmax.out` (@qqaatw ) - [x] `aten::roll` https://github.com/pytorch/pytorch/pull/95168 - [x] `aten::hardsigmoid` (@qqaatw ) - [x] `aten::logit` (@qqaatw ) - [x] `linalg_solve_triangular` - [x] `aten::sort.values_stable` https://github.com/pytorch/pytorch/issues/86750 - [x] `aten::remainder.Tensor_out` https://github.com/pytorch/pytorch/issues/86806 - [x] `aten::hardswish` https://github.com/pytorch/pytorch/issues/86807 - [x] `aten::nansum` https://github.com/pytorch/pytorch/issues/86809 - [x] `aten::fmod.Tensor_out` https://github.com/pytorch/pytorch/issues/86810 - [x] `aten::range` https://github.com/pytorch/pytorch/issues/86990 - [x] `aten::argsort` https://github.com/pytorch/pytorch/issues/86991 - [x] `aten::repeat_interleave` https://github.com/pytorch/pytorch/issues/87219 - [x] `aten::median` https://github.com/pytorch/pytorch/issues/87220 - [x] `aten::trace` https://github.com/pytorch/pytorch/issues/87221 - [x] `aten::im2col` (Falling back to CPU as its mostly used in preprocessing layers) - [x] `aten::_cdist_forward` https://github.com/pytorch/pytorch/pull/91643 - [x] `aten::native_group_norm_backward` (Implemented by @malfet ) - [x] `aten::grid_sampler_2d` (https://github.com/pytorch/pytorch/pull/94273) - [x] `aten::upsample_nearest1d_backward.grad_input` - [x] `aten::upsample_nearest1d.out` - [x] `aten::repeat_interleave.self_int` - [x] `aten::nan_to_num.out` - [x] `aten::unique_consecutive` https://github.com/pytorch/pytorch/pull/88532 - [x] `torch.bincount` https://github.com/pytorch/pytorch/pull/91267 - [x] `aten::_unique2` https://github.com/pytorch/pytorch/pull/88532 - [x] `aten::unfold` https://github.com/pytorch/pytorch/pull/91266 - [x] `aten::triangular_solve.X` https://github.com/pytorch/pytorch/pull/94345 - [x] `aten::nonzero` https://github.com/pytorch/pytorch/pull/91616 - [x] `aten::_index_put_impl_` (https://github.com/pytorch/pytorch/pull/85672) - [x] `aten::amax.out` (#79682) - [X] `aten::_slow_conv2d_forward` (https://github.com/pytorch/pytorch/pull/86303) - [x] `aten::eye.m_out` (https://github.com/pytorch/pytorch/pull/78408) - [x] `aten::multinomial` (https://github.com/pytorch/pytorch/pull/80760 ) - [x] `aten::flip` (#80214) - [x] `aten::equal` https://github.com/pytorch/pytorch/pull/80195 - [x] `aten::_local_scalar_dense` - [x] `aten::l1_loss_backward.grad_input` (#80010) - [x] `aten::glu.out` (#79866) - [x] ` aten::linspace.out` https://github.com/pytorch/pytorch/pull/78570 - [x] `aten::arange.out` https://github.com/pytorch/pytorch/pull/78789 - [x] `aten::adaptive_max_pool2d` https://github.com/pytorch/pytorch/pull/78410 - [x] `aten::count_nonzero.dim_IntList` - [x] `aten::softplus.out` (https://github.com/pytorch/pytorch/pull/78930) - [x] `aten::index_add.out` https://github.com/pytorch/pytorch/pull/79935 - [x] `aten::normal` (#80297) - [x] `aten::native_layer_norm_backward` https://github.com/pytorch/pytorch/pull/79189 - [x] `aten::logical_and.out` (#80216) - [x] `aten::frac.out` (https://github.com/pytorch/pytorch/pull/86625) - [x] `aten:: masked_select` https://github.com/pytorch/pytorch/pull/85818 - [x] `aten::softplus_backward.grad_input` (#79873) - [x] `aten::slow_conv_transpose2d.out` (@malfet could be due to incompatibility with torchvision) - [x] `aten::signbit.out` (https://github.com/pytorch/pytorch/pull/87214) - [X] `aten::cumsum.out` (https://github.com/pytorch/pytorch/pull/88319) - [X] `aten::cumprod.out` - [X] `aten::expm1.out` (https://github.com/pytorch/pytorch/pull/87147) - [x] `aten::bitwise_xor.Tensor_out` (https://github.com/pytorch/pytorch/pull/82307) - [x] `aten::bitwise_and.Tensor_out` (https://github.com/pytorch/pytorch/pull/82307) - [x] `aten::bitwise_or.Tensor_out` (https://github.com/pytorch/pytorch/pull/82307) - [x] `aten::index.Tensor` (https://github.com/pytorch/pytorch/pull/82507) - [x] `aten::index.Tensor_out` (https://github.com/pytorch/pytorch/pull/82507)

Ops not supported by MPS: Ops that will require either to use the CPU fallback system or a custom Metal kernel.

[ ] aten::lgamma.out
[ ] aten::linalg_householder_product

philipturner commented 1 year ago

Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.

albanD commented 1 year ago

Hey,

There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.

It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.

pzelasko commented 1 year ago

I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:

aten::flip
aten::equal
aten::upsample_nearest1d.out

Linux-cpp-lisp commented 1 year ago

One vote for a CPU fallback for torch.bincount.

Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)

richardburleigh commented 1 year ago

Tip for everyone:

Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.

I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).

gautierdag commented 1 year ago

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out Tried with the fallback env var but doesn't seem to work for me.

lhoenig commented 1 year ago

One missing op I ran into and haven't seen mentioned yet is aten::_unique2. Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1 when using the current main branch build. However, instead I get warnings

The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

then

The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)

and finally the forward pass through my model crashes with

RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944

On cpu it works fine. Could be #77886 I suppose.

Willian-Zhang commented 1 year ago

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out Tried with the fallback env var but doesn't seem to work for me.

+1 setting PYTORCH_ENABLE_MPS_FALLBACK=1 still results in:

NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

albanD commented 1 year ago

@lhoenig could you open a new separate issue for the cpu fallback failing for you? The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround. We can continue this discussion in the new issue you will create.

@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).

weiji14 commented 1 year ago

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

lhoenig commented 1 year ago

@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.

psobolewskiPhD commented 1 year ago

I've got a non supported op: aten::grid_sampler_2d

envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)

thipokKub commented 1 year ago

Not supported

aten::l1_loss_backward.grad_input
aten::kl_div_backward

Code

X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()

Output

NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend

tw-ilson commented 1 year ago

Trying to use affine crop from torchvision, and found the operator aten::linspace.out does not seem to be implemented with the MPS backend

nicolasbeglinger commented 1 year ago

Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor is not yet implemented.

feesta commented 1 year ago

Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.

mooey5775 commented 1 year ago

Would be great to add aten::adaptive_max_pool2d to the list - seems to be fairly common and for me useful in some point cloud architectures.

RohanM commented 1 year ago

I ran into this error with aten::count_nonzero.dim_IntList (via torch.count_nonzero()). I'll take a look at implementing this op with MPS.

arnauqb commented 1 year ago

The operator aten::lgamma.out is curently not yet implemented either.

GeoffreyBrunet commented 1 year ago

Hello, the operator Linspace is not implemented, for you my error message:

NotImplementedError Traceback (most recent call last) Input In [2], in <cell line: 10>() 7 device = torch.device("mps") 9 # Create random input and output data ---> 10 x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype) 11 y = torch.sin(x) 13 # Randomly initialize weights

NotImplementedError: The operator 'aten::linspace.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Thank you !

chrislemke commented 1 year ago

I would like to add aten::linalg_householder_product Using orthogonal parametrization with PYTORCH_ENABLE_MPS_FALLBACK=1. I get:

Q = torch.linalg.householder_product(A, tau)
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/560148d7-a559-11ec-8c96-4add460b61a6/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): 
error: input types 'tensor<13x10xf32>' and 'tensor<1x10xi32>' are not broadcast compatible

marioem commented 1 year ago

Hi, please consider aten::avg_pool3d.out.

sabify commented 1 year ago

The operator aten::erfinv.out is not implemented.

Tony-Tan commented 1 year ago

The operator aten::logical_and.out is not current implemented for the MPS device.

nicolasbeglinger commented 1 year ago

The operator aten::bitwise_and.Tensor_out is not yet implemented for the MPS backend.

jacoppock commented 1 year ago

The operator 'aten::_slow_conv2d_forward' is not currently implemented for the MPS device.

Also found this:

NotImplementedError: Could not run 'aten::_copy_from_and_resize' with arguments from the 'CPU' backend. after enacting the PYTORCH_ENABLE_MPS_FALLBACK=1 env variable.

svenkreiss commented 1 year ago

Got a message that aten::softplus.out is not supported. I'd need that to update OpenPifPaf.

kulinseth commented 1 year ago

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

You can use this as a guide: https://github.com/pytorch/pytorch/wiki/Adding-Op-for-MPS-Backend Please provide feedback if there is anything missing.

DenisVieriu97 commented 1 year ago

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

MPS backend already has support for aten::_local_scalar_dense (file https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/mps/operations/Scalar.mm). If you are still seeing the issue, could you please share the example you are trying to run?

hijkzzz commented 1 year ago

The operator 'aten::_index_putimpl' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature,

anirbansaha96 commented 1 year ago

'aten::_slow_conv2d_forward' +1. How can I start contributing to have PyTorch implemented?

JiaxinLi-lipluszn commented 1 year ago

One vote for aten::_slow_conv2d_forward since mmdet's ssd implementation relies on it.

Lyken17 commented 1 year ago

aten::_slow_conv2d_forward is not supported.

I am curious how the official benchmark on resnet-50 / vgg is measured? Any scripts or references?

![Uploading image.png…]()

grafaelw commented 1 year ago

'aten::softplus.out' is not supported. I was training a model with gpytorch and it showed up

jiogenes commented 1 year ago

I found that the operator aten::_ctc_loss is not curently implemented for the MPS deivce either.

ShawnDiego commented 1 year ago

aten::index_add.out is not supported.

tidalmelon commented 1 year ago

batch_y_len[batch_y_len<=0] = 1 NotImplementedError: The operator 'aten::_index_putimpl' is not current implemented for the MPS device.

grafaelw commented 1 year ago

I receive an error 'aten::normal is not implemented for the MPS device' after training a VAE models.

liulhdarks commented 1 year ago

After using device MPS, it will report 'aten::cumsum.out' op is missing, so I set environment variable 'PYTORCH_ENABLE_MPS_FALLBACK', but it will report the next error for GPT-2 model:

/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py:999: UserWarning: The operator 'aten::cumsum.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  position_ids = attention_mask.long().cumsum(-1) - 1
Traceback (most recent call last):
  File "/Users/lihua.llh/Documents/codes/lab/python/gpt2_demo/inferences/beam_generation.py", line 115, in <module>
    main()
  File "/Users/lihua.llh/Documents/codes/lab/python/gpt2_demo/inferences/beam_generation.py", line 102, in main
    outputs = model.generate(input_ids=input_ids, num_beams=5, max_length=500, num_return_sequences=2,
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/generation_utils.py", line 1344, in generate
    return self.beam_search(
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/generation_utils.py", line 2192, in beam_search
    outputs = self(
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1046, in forward
    transformer_outputs = self.transformer(
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 889, in forward
    outputs = block(
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 390, in forward
    attn_outputs = self.attn(
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 312, in forward
    query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lihua.llh/miniconda3/envs/torch-m1/lib/python3.8/site-packages/transformers/pytorch_utils.py", line 107, in forward
    x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
RuntimeError: tensors must be 2-D

albanD commented 1 year ago

@liulhdarks could you open a new issue to discuss this? It looks like the error is independent to the cumsum issue. Make sure to give details on the code you run, how to reproduce and if the code properly runs on CPU in the new issue!

proger commented 1 year ago

Found that aten::view_as_complex is not supported too. Using PYTORCH_ENABLE_MPS_FALLBACK=1 makes it possible to trigger a subsequent crash using slicing:

$ PYTORCH_ENABLE_MPS_FALLBACK=1 python3 -c 'import torch;  print(torch.view_as_complex(torch.randn(1,4,2).to("mps"))[...,:-1,:])' 

<string>:1: UserWarning: The operator 'aten::view_as_complex' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
<string>:1: UserWarning: 0The operator aten::view_as_complex appears to be a view operator, but it has no implementation for the backend "mps:0". View operators don't support falling back to run on the CPU, since the tensor's storage cannot be shared across devices. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/CPUFallback.cpp:175.)
libc++abi: terminating with uncaught exception of type c10::TypeError: Trying to convert ComplexFloat to the MPS backend but it does not have support for that dtype.
Exception raised from getMPSDataType at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/OperationUtils.mm:124 (most recent call first):
frame #0: at::native::mps::getMPSDataType(c10::ScalarType) + 452 (0x11142a080 in libtorch_cpu.dylib)
frame #1: invocation function for block in at::native::as_strided_tensorimpl_mps(at::Tensor const&, c10::ArrayRef<long long>, c10::ArrayRef<long long>, c10::optional<long long>) + 136 (0x111448e94 in libtorch_cpu.dylib)
frame #2: invocation function for block in at::native::mps::MPSGraphCache::CreateCachedGraph(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, at::native::mps::MPSCachedGraph* () block_pointer) + 216 (0x111437704 in libtorch_cpu.dylib)

albanD commented 1 year ago

@proger complex DTYPEs are not support for MPS at all right now I'm afraid. cc @kulinseth

okpatil4u commented 1 year ago

@albanD any timeline for this ?

whr778 commented 1 year ago

I ran into this finetuning mT5 NotImplementedError: The operator 'aten::_index_putimpl' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

FrankHeijden commented 1 year ago

Tried setting the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1, because aten::cumsum.out is currently not yet implemented, however I got the following error after setting the environment variable & trying to run a XGLM huggingface model:

/opt/homebrew/Caskroom/miniforge/base/envs/incoder-env/lib/python3.9/site-packages/transformers/models/xglm/modeling_xglm.py:155: UserWarning: The operator 'aten::cumsum.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:343: failed assertion `unsupported datatype for constant'

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

albanD commented 1 year ago

@okpatil4u PyTorch has more than 2000 different operators. So full support will definitely take quite a while. That's why we have this issue to help us prioritize which ones we're working on first.

okpatil4u commented 1 year ago

Wow ! Thank you @albanD for the amazing work that you and your team put into Pytorch.

khanh-lt commented 1 year ago

The operator 'aten::_slow_conv2d_forward' is not current implemented for the MPS device

Ansh3101 commented 1 year ago

Contributor

I have found that aten::slow_conv_transpose2d.out is not implemented for the MPS device.

Code

device = torch.device('')
z = torch.randn(25, 100, 1, 1).to(device)
out = gen(z)
show_tensor_images(out, num_images=25)
show_tensor_images(real, num_images=25, title='Real Images')

Error Message

NotImplementedError: The operator 'aten::slow_conv_transpose2d.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

amholler commented 1 year ago

Gated Linear Units are typically included in TabNet models; it would be great to add MPS support for aten::glu.out Getting an error message from Ludwig ML default TabNet:

  File "/Users/anne/ludwig_env/lib/python3.9/site-packages/ludwig/modules/tabnet_modules.py", line 205, in forward
    hidden = nn.functional.glu(hidden, dim=-1)  # [bs, s]
  File "/Users/anne/ludwig_env/lib/python3.9/site-packages/torch/nn/functional.py", line 1451, in glu
    return torch._C._nn.glu(input, dim)
NotImplementedError: The operator 'aten::glu.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

pytorch / pytorch

General MPS op coverage tracking issue #77764

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

Hello, the operator Linspace is not implemented, for you my error message: