Apple M3 raytracing acceleration

cblc commented 8 months ago

Now that the M3 has hardware raytracing, I think it could be very nice to add support for it. Is your GPU Optix renderer designed in a way that could be easy to adapt it?

mmp commented 8 months ago

I don't know anything about Metal so can't say. What is the state of apple's C++ support on the GPU?

cblc commented 8 months ago

Apple helped write the Metal backend for the Blender Cycles renderer, which is said to have good performance increases in the M3. I'll be buying a Mac Studio later this semester, when the M3 model is announced (I guess June or so), not sure if I'll be getting the Max or the Ultra model (it will depend on the money, as usual). From your comment about "C++ support on the GPU", I'm guessing that your GPU implementation does much more than just accelerating raytracing. Metal can compile code, but I'm not sure what complexity you require.

mmp commented 8 months ago

For NVIDIA GPUs, almost all of pbrt's rendering code (sampling, light source models, BSDF evaluation, etc.) runs on the GPU; it's almost entirely the same code as runs on the CPU. Then the CPU only does parsing the scene at the start, submitting work to the GPU, then reading back the final image and writing it to disk. So ideally all that could be done on Macs as well.

That said, it would be possible to send just the ray intersection work to the GPU. That would involve some communication overhead, though it might not be so bad on macs given unified memory. It wouldn't work well with the traditional CPU Integrators, since they are very much 'one ray at a time', while you'd want bigger batches of rays to send to the GPU. However, pbrt's WavefrontPathIntegrator works on big batches of rays (and is the basis of the GPU port). However, it can also run on the CPU (give --wavefront on the command line.) In wavefront/integrator.h there's a WavefrontAggregate interface and you can see in wavefront/aggregate.{h,cpp} there is a CPUAggregate.

So... you could implement a MetalAggregate that took rays and did the intersection tests via Metal. In aggregate.cpp you can see how CPUAggregate wraps the functionality of the CPU BVH accelerator. It might not be a ton of work if you know Metal.

One issue is pipelining: the CPU would be idle while the GPU did intersection tests and vice versa. So that might limit the performance benefit. But it would be interesting to try!

cblc commented 8 months ago

Thanks a lot for the clues. I'll take a look at it when I get the machine (I'll write a small Metal-based raytracer first, so that I can check how to optimize its efficiency in a small code base first).

mmp / pbrt-v4

Apple M3 raytracing acceleration #419