Open albertvaka opened 1 year ago
It looks like pytorch already supports the M1 chip, so it might be enough to use torch.device('mps')
. I'll give it a try.
Using torch.device('mps')
I get:
The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable
PYTORCH_ENABLE_MPS_FALLBACK=1
to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Then after setting PYTORCH_ENABLE_MPS_FALLBACK=1
I get:
Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Which comes from https://github.com/jina-ai/guided-diffusion... next I will try to monkey patch that to use float32 (since it seems there's no global way to tell numpy to only use float32) đ
By making all arrays in guided-difussion of type float32 I managed the code to continue until it reaches this repo's cond_fn
. However, it seems there's something wrong when cond_fn
calls MakeCutouts
which in turns calls Pytorch's RandomAffine
, causing the program to crash (triggers an assertion on Apple's code):
-:27:11: error: invalid input tensor shapes, indices shape and updates shape must be equal
-:27:11: note: see current operation: %25 = "mps.scatter_along_axis"(%23, %arg4, %24, %1) {mode = 6 : i32} : (tensor<150528xf32>, tensor<224xf32>, tensor<50176xi32>, tensor<i32>) -> tensor<150528xf32>
/AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1267: failed assertion `Error: MLIR pass manager failed'
I've tried commenting out the T.RandomAffine(...)
transformation and it continues further but fails again when cond_fn
calls model_stat['clip_model'].encode_image(...)
which ends up calling Pytorch's layer_norm
and crashing with:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
I give up đ
I give up đ never give up
Can we expect this to be ported from Cuda to the Apple Accelerate framework (or something else) so it can run on Mac laptops?