ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
BSD 3-Clause "New" or "Revised" License
410 stars 104 forks source link

Multi-GPU support #491

Open jindrahelcl opened 7 years ago

StoyanVenDimitrov commented 7 years ago

Hi, is there a branch where you already started working on this?

varisd commented 7 years ago

Hi, As far as I know, there is no branch dedicated to this issue yet.

varisd commented 7 years ago

I was looking at the possible solutions to this problem and this seemed like a good solution: https://www.tensorflow.org/tutorials/deep_cnn#training_a_model_using_multiple_gpu_cards

Basically, we add an additional option to the [tf_manager] (or maybe [main]) specifying, which gpu devices are available (it would be even better if we could detect them from CUDA_VISIBLE_DEVICE) and create separate graph operations for each gpu device (possibly just by modifying decorators).

The variables would be stored either on CPU or on of the GPU (this should be also specified by a config option). This can be probably done by specifying the PS device on the whole graph. The device for graph operations would be the overriden in the specific sections of code (again, hopefully just by modifying decorators). Also, some changes to the way we update variables will be needed.

This is only a multi-GPU solution, the support for fully distributed computing would probably require some more work. But the multi-GPU solution should be a good starting point.

jlibovicky commented 7 years ago

Have a look, e.g. at this tutorial or TF documentation. I think it looks a little bit better because the graphs can run in separate processes, so they can run even on separate machines. They probably communicate using protocol buffers, so there might be some communication overhead.

jindrahelcl commented 7 years ago

@varisd are you willing to look into this? It would be great if we'd finally have this.

varisd commented 7 years ago

I have already assigned the issue to myself and I plan to work on it this week (and if necessary the following weeks).

varisd commented 6 years ago

I am currently swamped by other issues (mainly debugging the ensembles branch) so I am putting this on hold. I created a branch 'multigpu' for this issue and commited my initial changes.

Mostafa H wants to help out with this, so he will keep us updated (hopefully via this thread).

mhany90 commented 6 years ago

Hi, so the main issue seems to be that 'tf.train.Supervisor' freezes the graph, leading any modifications, such as those in 'runtime_loss' in the decoder to cause this: RuntimeError: Graph is finalized and cannot be modified.

varisd commented 6 years ago

Yes, that's the problem I ran into. The reason why this happens is either:

I guess we need to move the tf_manager.init_supervisors() call out of the tf_manager.init(). Probably to the runner/training_loop? However, the problem might be somewhere else.

mhany90 commented 6 years ago

Yeah, I think the exact part which freezes the graph is this:

When tf_manager.initialize_model_parts is called in learning_utils, it calls tf_manager.get_sessions(), which calls sv.prepare_or_wait_for_session, and this is what freezes it, I think, not the tf_manager.init_supervisors().

So then, I think any call to tf_manager.get_sessions() freezes the graph, including this even:

tb_writer = tf.summary.FileWriter(
      #  log_directory, tf_manager.get_sessions()[0].graph)

I'm not sure of how to avoid that.