Support for Apple Silicon / MPS

j-f1 commented 1 year ago

I’m not sure if current gen Apple Silicon GPUs are capable of doing the computation fast enough (probably not tbh) but it would be great to get it working so folks can at least try it out. I tried changing all the mentions of cuda in the project to mps, but I’m getting an error in TensorScript which suggests some changes need to be made to the model to not assume CUDA. Is there a way to fix/patch this?

NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/diffusers/models/unet_2d_condition/___torch_mangle_4939.py", line 44, in forward
    _4 = ops.prim.NumToTensor(torch.size(sample, 0))
    timesteps = torch.expand(timestep, [int(_4)])
    input0 = torch.to((time_proj).forward(timesteps, ), 5)
                       ~~~~~~~~~~~~~~~~~~ <--- HERE
    _5 = (time_embedding).forward(input0, )
    _6 = (conv_in).forward(sample, )
  File "code/__torch__/diffusers/models/embeddings/___torch_mangle_4232.py", line 8, in forward
  def forward(self: __torch__.diffusers.models.embeddings.___torch_mangle_4232.Timesteps,
    timesteps: Tensor) -> Tensor:
    _0 = torch.arange(0, 160, dtype=6, layout=None, device=torch.device("cuda:0"), pin_memory=False)
         ~~~~~~~~~~~~ <--- HERE
    exponent = torch.mul(_0, CONSTANTS.c0)
    exponent0 = torch.div(exponent, CONSTANTS.c1)

hmartiro commented 1 year ago

Potentially you could try not tracing the UNet, if that is causing the issue: https://github.com/riffusion/riffusion-inference/blob/main/riffusion/server.py#L99

I do expect it to be too slow for real time on MPS, and not tracing will slow it further

ashwal commented 1 year ago

Can confirm removing the trace works. I also did my own trace, but see ~negligible difference. Perhaps because aten::repeat_interleave.self_int is not supported on MPS.

I get ~2it/s on a Apple M1 Max (18sec image gen).

The real bottleneck appears to be wav_bytes_from_spectrogram_image @ ~100sec. Doesn't look like MPS supports GriffinLim. It seg faults for me, but I only gave it a brief look

hmartiro commented 1 year ago

Interesting about GriffinLim. There's been some talk of finding a neural vocoder that has better quality, could be something to track.

hmartiro commented 1 year ago

Follow up, a lot of fourier operations are not supported on MPS yet, in particular the ComplexFloat data type: https://github.com/pytorch/pytorch/issues/78044

Until that is resolved, generation could work but the entire stack will not

hmartiro commented 1 year ago

MPS is now supported as a device with CPU fallback for some operations. See the README for a description!

riffusion / riffusion-hobby

Support for Apple Silicon / MPS #15