snuspl / parallax

A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.
Apache License 2.0
130 stars 35 forks source link

Memory usage #16

Closed idibidiart closed 6 years ago

idibidiart commented 6 years ago

Hi,

I'd like to try using this. Seems really nice. I have a question (please note I'm new to TensorFlow)

I have N machines, each with one GPU, M Gigabytes of disk storage and J Gigabytes of memory, and my training dataset is accessible to all machines. How do I configure things so that when training in async data parallel mode (PS option) I can make sure only M and J Gigabytes are used per machine? So I can avoid any memory errors.

Could you setup a Google Groups for this project so we may ask these kinds of questions there?

Thank you!

bgchun commented 6 years ago

@idibidiart We have a google groups: parallax-dev@googlegroups.com. Thanks!

idibidiart commented 6 years ago

Great! I’ll email any questions that come up and I’ll see you there! Thank you!