mila-iqia / platoon

Multi-GPU mini-framework for Theano
MIT License
195 stars 41 forks source link

valid_sync option to copy params before and after validation. #26

Closed abergeron closed 8 years ago

abergeron commented 8 years ago

This also has a implementation of ASGD in the last commit.

abergeron commented 8 years ago

I addressed the comments and ran a basic test of the ASGD rule with the LSTM examples. It's still running, but it did a couple of updates with success. I just don't know if it's going in the right direction yet.

rizar commented 8 years ago

I tried to do worker.close(), and got an error:

Traceback (most recent call last):
  File "/home/rizar/Dist/blocks-examples/mnist/__init__.py", line 139, in <module>
    args.learning_rate, args.sync_freq, args.alpha)
  File "/home/rizar/Dist/blocks-examples/mnist/__init__.py", line 118, in main
    worker.close()
  File "/home/rizar/Dist/platoon/platoon/channel.py", line 461, in close
    self._shmref.unlink()
posix_ipc.ExistentialError: No shared memory exists with the specified name
abergeron commented 8 years ago

I've made a fix for that.

abergeron commented 8 years ago

The example seems to work with the new ASGD rule. I did not check if training performance is any better.