mila-iqia / blocks-extras

A collection of extensions to the Blocks framework
MIT License
27 stars 40 forks source link

Synchronized training (EASGD) #38

Closed rizar closed 8 years ago

rizar commented 8 years ago

Contains my work in progress on an extension for Platoon. Includes #37, will be rebased upon the merge of that PR.

rizar commented 8 years ago

Usage example:. Disclaimer: can not get any benefit from EAGS so far :(

My reasoning is the following: even though communication is definitely overwhelming for the simple MNIST demo that I use, I should see much better results after running two training processes for N epochs in parallel than after running one process for the same number of epochs. So far, it only gets worse when I use more than one process.

rizar commented 8 years ago

@nouiz @abergeron @carriepl : you can track the progress here.

carriepl commented 8 years ago

@rizar : What {alpha, number of processes, sync_freq} combinations did you try?

carriepl commented 8 years ago

Also, FYI, there is a PR under review to rename Soldier and Lieutenant to more neutral names. Obviously, it would have an effect on this PR. https://github.com/mila-udem/platoon/pull/22

rizar commented 8 years ago

Sure, thanks for the heads-up, I will rename everything as soon as you guys merge the PR.

On 12 January 2016 at 14:00, carriepl notifications@github.com wrote:

Also, FYI, there is a PR under review to rename Soldier and Lieutenant to more neutral names. Obviously, it would have an effect on this PR. mila-udem/platoon#22 https://github.com/mila-udem/platoon/pull/22

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-extras/pull/38#issuecomment-171016074 .

carriepl commented 8 years ago

@rizar : the PR has been merged

rizar commented 8 years ago

@sotelo, this is WIP for data-parallel training

rizar commented 8 years ago

I ran it to convergence, and we can see that the workers do not help each other at all. Blue line is the progress of a job working alone, red and green are two workers working in parallel. Test error is displayed.

plot

rizar commented 8 years ago

This was rebased and can now be used with the latest master.

nouiz commented 8 years ago

https://github.com/abergeron/platoon/commit/afcea0ff2014156ff9d8377f48f6ba1542fe1750 when the corresponding PR is merged, we can have the worker close correction the connection. It would be great to update this PR with that.

rizar commented 8 years ago

For the record, with ASGD I do get the 2x speedup on 2 GPUs.

rizar commented 8 years ago

Status update: this PR is ready to be reviewed, except for blocks-parallel. In fact I think that we should not merge blocks-parallel, because the users can quickly implement scripts like these using their scripting language of choice.

@bartvm , would you like to take a look?

nouiz commented 8 years ago

This can be reviewed, but there is a PR to platoon that change a little the interface. So before merging this, wait for this PR to be merged and this one to be updated:

https://github.com/mila-udem/platoon/pull/29

On Fri, Jan 15, 2016 at 11:58 AM Dzmitry Bahdanau notifications@github.com wrote:

Status update: this PR is ready to be reviewed, except for blocks-parallel. In fact I think that we should not merge blocks-parallel, because the users can quickly implement scripts like these using their scripting language of choice.

@bartvm https://github.com/bartvm , would you like to take a look?

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-extras/pull/38#issuecomment-172015412 .

rizar commented 8 years ago

OK, no problem, I can wait.

nouiz commented 8 years ago

The other PR is now merged.

rizar commented 8 years ago

@abergeron , this PR is functional again, you can review it.