Closed rizar closed 8 years ago
Usage example:. Disclaimer: can not get any benefit from EAGS so far :(
My reasoning is the following: even though communication is definitely overwhelming for the simple MNIST demo that I use, I should see much better results after running two training processes for N epochs in parallel than after running one process for the same number of epochs. So far, it only gets worse when I use more than one process.
@nouiz @abergeron @carriepl : you can track the progress here.
@rizar : What {alpha, number of processes, sync_freq} combinations did you try?
Also, FYI, there is a PR under review to rename Soldier and Lieutenant to more neutral names. Obviously, it would have an effect on this PR. https://github.com/mila-udem/platoon/pull/22
Sure, thanks for the heads-up, I will rename everything as soon as you guys merge the PR.
On 12 January 2016 at 14:00, carriepl notifications@github.com wrote:
Also, FYI, there is a PR under review to rename Soldier and Lieutenant to more neutral names. Obviously, it would have an effect on this PR. mila-udem/platoon#22 https://github.com/mila-udem/platoon/pull/22
— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-extras/pull/38#issuecomment-171016074 .
@rizar : the PR has been merged
@sotelo, this is WIP for data-parallel training
I ran it to convergence, and we can see that the workers do not help each other at all. Blue line is the progress of a job working alone, red and green are two workers working in parallel. Test error is displayed.
This was rebased and can now be used with the latest master.
https://github.com/abergeron/platoon/commit/afcea0ff2014156ff9d8377f48f6ba1542fe1750 when the corresponding PR is merged, we can have the worker close correction the connection. It would be great to update this PR with that.
For the record, with ASGD I do get the 2x speedup on 2 GPUs.
Status update: this PR is ready to be reviewed, except for blocks-parallel
. In fact I think that we should not merge blocks-parallel, because the users can quickly implement scripts like these using their scripting language of choice.
@bartvm , would you like to take a look?
This can be reviewed, but there is a PR to platoon that change a little the interface. So before merging this, wait for this PR to be merged and this one to be updated:
https://github.com/mila-udem/platoon/pull/29
On Fri, Jan 15, 2016 at 11:58 AM Dzmitry Bahdanau notifications@github.com wrote:
Status update: this PR is ready to be reviewed, except for blocks-parallel. In fact I think that we should not merge blocks-parallel, because the users can quickly implement scripts like these using their scripting language of choice.
@bartvm https://github.com/bartvm , would you like to take a look?
— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-extras/pull/38#issuecomment-172015412 .
OK, no problem, I can wait.
The other PR is now merged.
@abergeron , this PR is functional again, you can review it.
Contains my work in progress on an extension for Platoon. Includes #37, will be rebased upon the merge of that PR.