Closed cblc closed 3 weeks ago
I don't know anything about Metal so can't say. What is the state of apple's C++ support on the GPU?
Apple helped write the Metal backend for the Blender Cycles renderer, which is said to have good performance increases in the M3. I'll be buying a Mac Studio later this semester, when the M3 model is announced (I guess June or so), not sure if I'll be getting the Max or the Ultra model (it will depend on the money, as usual). From your comment about "C++ support on the GPU", I'm guessing that your GPU implementation does much more than just accelerating raytracing. Metal can compile code, but I'm not sure what complexity you require.
For NVIDIA GPUs, almost all of pbrt's rendering code (sampling, light source models, BSDF evaluation, etc.) runs on the GPU; it's almost entirely the same code as runs on the CPU. Then the CPU only does parsing the scene at the start, submitting work to the GPU, then reading back the final image and writing it to disk. So ideally all that could be done on Macs as well.
That said, it would be possible to send just the ray intersection work to the GPU. That would involve some communication overhead, though it might not be so bad on macs given unified memory. It wouldn't work well with the traditional CPU Integrator
s, since they are very much 'one ray at a time', while you'd want bigger batches of rays to send to the GPU. However, pbrt's WavefrontPathIntegrator
works on big batches of rays (and is the basis of the GPU port). However, it can also run on the CPU (give --wavefront
on the command line.) In wavefront/integrator.h
there's a WavefrontAggregate
interface and you can see in wavefront/aggregate.{h,cpp}
there is a CPUAggregate
.
So... you could implement a MetalAggregate
that took rays and did the intersection tests via Metal. In aggregate.cpp
you can see how CPUAggregate
wraps the functionality of the CPU BVH accelerator. It might not be a ton of work if you know Metal.
One issue is pipelining: the CPU would be idle while the GPU did intersection tests and vice versa. So that might limit the performance benefit. But it would be interesting to try!
Thanks a lot for the clues. I'll take a look at it when I get the machine (I'll write a small Metal-based raytracer first, so that I can check how to optimize its efficiency in a small code base first).
Now that the M3 has hardware raytracing, I think it could be very nice to add support for it. Is your GPU Optix renderer designed in a way that could be easy to adapt it?