Closed pierotofy closed 8 months ago
This feature could become very useful in near future. 100x slower is actually not too slow. With an extra CPU based optimizations like Intel AVX-512 and AMX, I think it should be feasible to give another 10x performance boost.
Agree! I was actually quite pleased with the performance (and it's a first implementation, so lots of room for improvements).
Adds CPU support, which allows to generate splats without any GPUs (if you have time to wait).
Currently it runs about ~100x slower than CUDA, but there's probably room for improvement in performance. In particular, there might be ways to parallelize some calculations in the renderer, as well as evaluating whether the project_forward and compute_sh_forward functions could benefit from a manual C++ implementation (without relying on libtorch, which conveniently gives us the backward passes for free).
The results should be numerically equivalent to the CUDA implementation (although some numerical precision differences will be present).
Closes #5
Try it by passing the
--cpu
flag to./opensplat