Closed Kchour closed 5 months ago
Madrona only uses one GPU for each batch simulator instance. You can create multiple batch simulators on different GPUs (in one process, or multiple processes) to do multi-GPU training.
Understood! I was thinking about an application where the available memory on a single GPU is not enough (i.e 1e4-1e6 entities per simulation), hence, the use of multiple devices to overcome this limitation (similar to a distributed simulation in a sense).
Am I correct to assume that all data in the registry table (i.e. where components, entity id's are stored) is globally accessible from each cuda block? If so, I suppose the major hurdle is how to optimally communicate data between separate cuda devices (and to limit this as much as possible). A naive approach would be extremely inefficient...sounds like it could be another chapter in a certain thesis :)
Yeah, for very large simulations work would need to be done to allow ECS state to be shared between GPUs efficiently. Yep the ECS state is all in global CUDA memory. For simple parallel for ECS systems I can think of pretty simple strategies that would work for multi-GPU pretty efficiently. The challenges always arise when reductions / global operations across the simulation are necessary - then you would need to do something more intelligent to minimize cross GPU memory traffic.
We're definitely interested in applications that would require scaling up Madrona to efficiently leverage a rack of GPUs in interesting ways (rather than a bunch of individual GPUs working largely independently). This work isn't super high priority to us right now, but if someone comes to us with a specific application that would be enabled by better multi-GPU functionality I'd be happy to chat.
Closing this for now, feel free to open another issue or email me if you want to discuss further!
First of all, I want to say great work!
I was just wondering if Madrona supports a host PC with multiple GPU devices? According to my best understanding, this feature is not supported; from a cursory glance through the codebase,
cudaSetDevice()
is used to set the current GPU device to use, but I haven't figured out where, when, and how many times it's being called. If it's being called just once, then it implies only 1 GPU is being used..Please correct my understanding! Thanks!