peteanderson80 / Matterport3DSimulator

AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Other
503 stars 130 forks source link

Add support for HPC/Multiple GPU Devices with Limited Access #89

Closed siddk closed 2 years ago

siddk commented 3 years ago

By default, EGLInitialize() (in containers like Docker/Singularity or natively) looks to bind to GPU:0 (first absolute device) on a given machine.

When running MatterSim under HPC (e.g., via SLURM or other job schedulers) that allocate GPU != 0, EGLInitialize fails.

This PR addresses this by looping through all devices visible to current allocated job (SLURM task), and initializing appropriately. No change in the underlying Dockerfile/build process is necessary (eglext.h is included by default).