Describe the solution you'd like
Marius currently only supports single-GPU training. A simple DDP version of model training can be implemented to enable single-machine training with multiple GPUs.
For models without embedding parameters, this is simple, as we only need to add an all-reduce step after n batches have been processed.
For models with embeddings residing on disk or CPU memory, we can use the same all-reduce approach, but we need to be careful to minimize staleness when writing embedding updates back to the CPU from each GPU.
Describe the solution you'd like Marius currently only supports single-GPU training. A simple DDP version of model training can be implemented to enable single-machine training with multiple GPUs.
For models without embedding parameters, this is simple, as we only need to add an all-reduce step after
n
batches have been processed.For models with embeddings residing on disk or CPU memory, we can use the same all-reduce approach, but we need to be careful to minimize staleness when writing embedding updates back to the CPU from each GPU.