Closed nikita6187 closed 6 years ago
I've just tested it on a different machine (Ubuntu, GPU enabled, TF 1.4.1) and I also get the same errors.
@ebrevdo do you have any suggestions?
@nikita68 this is a very involved example. can you provide a much smaller, minimal, failure case?
at least try running your while_loops with parallel_iterations=1
since it looks like you're assigning values inside your body and this is going to happen concurrently and mess everything up :-p
@ebrevdo I completely forgot about that parameter! Unfortunately, the error still persists. I'm trying to make a really small example showing where the code fails, but in the meantime here is a minimum version of the code above (the error is in the run_full_transducer.body function):
import tensorflow as tf
from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple
from tensorflow.python.layers import core as layers_core
import numpy as np
# NOTE: Time major
# Constants
input_dimensions = 1
vocab_size = 3
input_embedding_size = 20
encoder_hidden_units = 64
batch_size = 1
input_block_size = 2
# ----------------- Model -------------------------------
class Model(object):
def __init__(self):
self.encoder_inputs, self.encoder_inputs_length, self.encoder_hidden_state, \
self.encoder_outputs, self.encoder_hidden_state_new = self.build_encoder_model()
def build_encoder_model(self):
encoder_inputs = tf.Variable(tf.zeros(shape=(input_block_size, batch_size, input_dimensions)),
dtype=tf.float32, name='encoder_inputs', trainable=False)
encoder_inputs_length = tf.Variable([tf.shape(encoder_inputs)[0]], dtype=tf.int32,
name='encoder_inputs_length', trainable=False)
encoder_hidden_state = tf.Variable(tf.zeros(shape=(2, 1, encoder_hidden_units)), dtype=tf.float32,
name='encoder_hidden_state') # Save the state as one tensor
encoder_inputs_embedded = encoder_inputs
# Build model
encoder_cell = tf.contrib.rnn.LSTMCell(encoder_hidden_units)
# Build previous state
encoder_hidden_c, encoder_hidden_h = tf.split(encoder_hidden_state, num_or_size_splits=2, axis=0)
encoder_hidden_c = tf.reshape(encoder_hidden_c, shape=[-1, encoder_hidden_units])
encoder_hidden_h = tf.reshape(encoder_hidden_h, shape=[-1, encoder_hidden_units])
encoder_hidden_state_t = LSTMStateTuple(encoder_hidden_c, encoder_hidden_h)
# encoder_outputs: [max_time, batch_size, num_units]
encoder_outputs, encoder_hidden_state_new = tf.nn.dynamic_rnn(
encoder_cell, encoder_inputs_embedded,
sequence_length=encoder_inputs_length, time_major=True,
dtype=tf.float32, initial_state=encoder_hidden_state_t)
# Modify output of encoder_hidden_state_new so that it can be fed back in again without problems.
encoder_hidden_state_new = tf.concat([encoder_hidden_state_new.c, encoder_hidden_state_new.h], axis=0)
encoder_hidden_state_new = tf.reshape(encoder_hidden_state_new, shape=[2, -1, encoder_hidden_units])
return encoder_inputs, encoder_inputs_length, encoder_hidden_state, encoder_outputs, encoder_hidden_state_new
model = Model()
# ----------------- Training --------------------------
def run_full_transducer():
# Inputs
max_blocks = tf.placeholder(dtype=tf.int32, name='max_blocks') # How often to run the encoder
inputs_full_raw = tf.placeholder(shape=(None, batch_size, input_dimensions), dtype=tf.float32,
name='inputs_full_raw')
# Turn inputs into tensor which is easily readable
inputs_full = tf.reshape(inputs_full_raw, shape=[max_blocks, input_block_size, batch_size, input_dimensions])
# Hidden states
encoder_hidden_init = tf.ones(shape=(2, 1, encoder_hidden_units))
init_state = (0, encoder_hidden_init)
def cond(current_block, encoder_hidden):
return current_block < max_blocks
def body(current_block, encoder_hidden):
# Process encoder
model.encoder_inputs = model.encoder_inputs.assign(inputs_full[current_block])
model.encoder_inputs_length = model.encoder_inputs_length.assign([tf.shape(model.encoder_inputs)[0]])
model.encoder_hidden_state = model.encoder_hidden_state.assign(encoder_hidden)
# TODO: Error is SOMETIMES gone when using tf.Print. If you comment out the next 2 lines the return val is 0.
current_block = tf.Print(current_block, [model.encoder_inputs], message='Enc in: ')
current_block = tf.Print(current_block, [model.encoder_outputs], message='Enc out: ')
return current_block + 1, model.encoder_hidden_state_new
_, final_enc_state = tf.while_loop(cond, body, init_state, parallel_iterations=1)
return max_blocks, inputs_full_raw, model.encoder_outputs, final_enc_state
# ---------------------- Management -----------------------------
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
inp_max_blocks, inp_inputs_full_raw, enc_out, fin_enc_state = run_full_transducer()
out, _ = sess.run([enc_out, fin_enc_state], feed_dict={
inp_max_blocks: 3, # How often to run the encoder
inp_inputs_full_raw: np.ones(shape=(3 * input_block_size, 1, input_dimensions)) # Full inputs
})
print 'Encoder outputs: ' + str(out)
@ebrevdo I've tried to make to make a smaller fail case, but I can't seem to find a different way to repeat than in the code in my previous comment. Though it does seem as though that the tensors of the encoder are called and evaluated once at the start in the while loop, and the other results are just the previous values without reevaluation. EDIT: I've read up on while loops, and realized it is not possible to evaluate tensors defined outside of the loop. Due to this I will probably rewrite my model, and then this bug becomes obsolete for my case. Due to the obscure conditions needed for this bug, I'm closing the issue.
Hello! I believe to have found a bug in Tensorflow when running the code below. I am currently trying to build a neural transducer, and have stumbled across TF sometimes not returning any values for a tensor. I have not had the chance yet to test this out on another machine (no GPU, TF 1.4.1, Ubuntu 17.10). The code is redacted a bit to highlight only the parts that fail. I've also posted to StackOverflow but didn't get any response there.
Notes:
Example of a correct return value (more or less):
Incorrect:
Code:
System information:
Thanks! Nikita