Open xiang-deng opened 6 years ago
In my case, I am training a lstm_seq2seq model with lstm_seq2seq params, I got the following err when I try to use multi GPU."ValueError: Cannot use 'lstm_seq2seq/parallel_1_5/lstm_seq2seq/body/lstm_seq2seq/encoder/rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/MatMul' as input to 'Identity_2' because they are in different while loops. See info log for more details."
i want to follow up on this issue, i recently modified body and introduced a while loop into it. This gives an error very similar to that from @mapingshuo . The code works fine with single gpu and crash with multi-gpu where raised a valueError involving 2 identities that are from different name_scope (parallel_0_5, parallel_1_5 for instance). Disabling daisy_variable_chain will get pass this error but would get into an even worse mess.
According to my understanding, this is because parallel scope in data_parallelism is only a name_scope and therefore would not prevent fetching of variables from parallel_1 to parallel_0 as input and hence we got the while_loop check error.
On another hand, the fetching of variable from different copies might not be a bad idea since thats how we train multi-gpu models i believe, so this just unfortunately doesnt work on stuff when while_loop is involved.
Maybe modifying the daisy_chain_getter and introduce regex against while loop variables would work? Would be great to get a pointer to tackling this issue.
This setting controls whether to copy variables around in a daisy chain (if true) or leave their placement to TensorFlow. It only affects multi device training and mostly should be turned on for performance. One exception are recurrent models: with dynamic loops it must be off.
Kindly refer to this, worked for me!
I add a recurrent layer to the model body, as in my model the current output depends on the previous. It runs well in single GPU but fails when use multi-gpu. the log looks as below:
I use the raw
tf.nn.dynamic_rnn
, is there extra modification needed for it to work? Thanks in advance.