marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Add Multi-GPU Support #114

Open JasonMoho opened 2 years ago

JasonMoho commented 2 years ago

Describe the solution you'd like Marius currently only supports single-GPU training. A simple DDP version of model training can be implemented to enable single-machine training with multiple GPUs.

For models without embedding parameters, this is simple, as we only need to add an all-reduce step after n batches have been processed.

For models with embeddings residing on disk or CPU memory, we can use the same all-reduce approach, but we need to be careful to minimize staleness when writing embedding updates back to the CPU from each GPU.