pytorch / hydra-torch

Configuration classes enabling type-safe PyTorch configuration for Hydra apps
MIT License
205 stars 15 forks source link

Single-node distributed processing with Hydra #42

Open briankosw opened 3 years ago

briankosw commented 3 years ago

Distributed processing with Hydra in single-node multi-GPU setting, as mentioned here.

This will serve as an introductory example for #38.

briankosw commented 3 years ago

@romesco would love your feedback on this!

romesco commented 3 years ago

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

omry commented 3 years ago

I think the idea here is to not actually train but just demonstrate basic primitives.

briankosw commented 3 years ago

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

If you check this PR out, you'll see a basic distributed processing setup using Hydra and distributed communication primitives between multiple processes. This is basically as simple as it gets and much simpler than MNIST.

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

So this PR/example will be about how Hydra helps set up distributed processes without using configs? Should the configs aspect be implemented in the other PR?

I think the idea here is to not actually train but just demonstrate basic primitives.

In that case, I will only demonstrate how Hydra can be used to set up distributed processing.