Closed pfxuan closed 7 months ago
Hi @pfxuan! Thanks for the branch. It was helpful as a starting point. I've fixed the memory access error here by converting all inputs to CPU before running the CPU-only rasterizer: https://github.com/andrewkchan/OpenSplat/tree/achan/mps-backend
The code is working on my M3 MacBook when run on the banana example. It's still slow.
I don't think MPS is useful at the moment until we have ported over the rasterization kernels to metal, because the bulk of the work will still be done on CPU. I can work on this and send a pull request when ready.
Correct, most of the time is spent on the projection/rasterization steps.
If we can rewrite CPU code using the native torch Tensors, I'm thinking we could potentially leverage pytorch MPS backend for GPU acceleration. I was able to observe the offloaded workload running on metal and have some proof-of-concept code available. Additionally, there's a pure PyTorch-based implementation that we can explore further: https://github.com/hbb1/torch-splatting
Interesting idea! That seems like it would be useful for the CPU kernels as well, since then they can take advantage of SIMD even on machines that don't have GPU accelerated backends available.
I am working on metal shader port in the branch above and it's been coming along nicely. Hope to have something out over the next few days.
I was able to run through MacOS arm-64 build using M2 max chip. But it's very slow. It would be great if we can make MPS backend work. I started a quick test from this branch. But it seems like there is a memory allocation problem.