shacklettbp / madrona

MIT License
254 stars 23 forks source link

Multi-GPU support? #22

Closed Kchour closed 5 months ago

Kchour commented 6 months ago

First of all, I want to say great work!

I was just wondering if Madrona supports a host PC with multiple GPU devices? According to my best understanding, this feature is not supported; from a cursory glance through the codebase, cudaSetDevice() is used to set the current GPU device to use, but I haven't figured out where, when, and how many times it's being called. If it's being called just once, then it implies only 1 GPU is being used..

Please correct my understanding! Thanks!

shacklettbp commented 6 months ago

Madrona only uses one GPU for each batch simulator instance. You can create multiple batch simulators on different GPUs (in one process, or multiple processes) to do multi-GPU training.

Kchour commented 6 months ago

Understood! I was thinking about an application where the available memory on a single GPU is not enough (i.e 1e4-1e6 entities per simulation), hence, the use of multiple devices to overcome this limitation (similar to a distributed simulation in a sense).

Am I correct to assume that all data in the registry table (i.e. where components, entity id's are stored) is globally accessible from each cuda block? If so, I suppose the major hurdle is how to optimally communicate data between separate cuda devices (and to limit this as much as possible). A naive approach would be extremely inefficient...sounds like it could be another chapter in a certain thesis :)

shacklettbp commented 6 months ago

Yeah, for very large simulations work would need to be done to allow ECS state to be shared between GPUs efficiently. Yep the ECS state is all in global CUDA memory. For simple parallel for ECS systems I can think of pretty simple strategies that would work for multi-GPU pretty efficiently. The challenges always arise when reductions / global operations across the simulation are necessary - then you would need to do something more intelligent to minimize cross GPU memory traffic.

We're definitely interested in applications that would require scaling up Madrona to efficiently leverage a rack of GPUs in interesting ways (rather than a bunch of individual GPUs working largely independently). This work isn't super high priority to us right now, but if someone comes to us with a specific application that would be enabled by better multi-GPU functionality I'd be happy to chat.

shacklettbp commented 5 months ago

Closing this for now, feel free to open another issue or email me if you want to discuss further!