Virtual memory clean up when use gpu version

zhanghaoie commented 6 years ago

Hi,

I am training a model based on the pre-trained universal sentence encoder model. When i was using gpu to train the model, the virtual memory was constantly increasing, then the system killed the process. But it works fine when I use cpu to train the model. Do you know the reason?

Best regards, Hao

vbardiovskyg commented 6 years ago

Hi,

we don't know any of the aspect of universal sentence encoder model that could cause memory leaking. On a side note, the universal sentence encoder model is a DAN encoder, so using GPU will not boost the speed much.

Is it possible that this question is related to your usecase: https://groups.google.com/a/tensorflow.org/forum/#!msg/hub/ONjFEdpJFp8/si6Vpk_QAwAJ ?

On Wed, May 2, 2018 at 9:37 AM, zhanghaoie notifications@github.com wrote:

Hi,

I am training a model based on the pre-trained universal sentence encoder model. When i was using gpu to train the model, the virtual memory was constantly increasing, then the system killed the process. But it works fine when I use cpu to train the model. Do you know the reason?

Best regards, Hao

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/hub/issues/45, or mute the thread https://github.com/notifications/unsubscribe-auth/AjrXjhYkFfgGmAFkm-QhiEM2jr0TRloCks5tuWIjgaJpZM4Tu_Po .

-- Vojtech

zhanghaoie commented 6 years ago

Hi, Thank you very much for your reply.

I have not defined USE model in any loop.

Below is the graph I have built:

def _sent_network(self,embed,h_inputs,b_inputs,max_length):

    h_embeddings = embed(h_inputs)
    b_embeddings = embed(b_inputs)
    h_embeddings = tf.expand_dims(h_embeddings,axis=1)
    h_embeddings = tf.tile(h_embeddings,(1,max_length,1))
    h_embeddings = tf.reshape(h_embeddings,(-1,512))

    outputs = tf.concat([h_embeddings,b_embeddings],axis=1)

    for layer in range(self.mlp_layers):

        if self.dropout_rate:
            outputs = tf.layers.dropout(outputs,rate=self.dropout_rate,training=self._training)
        outputs = tf.layers.dense(outputs,self.num_neurons[layer],activation=self.activation,kernel_initializer=self.initializer,
                                     name = "hidden{}".format(layer+1))

        outputs = self.activation(outputs,name= "hidden{}_out".format(layer+1))

    return outputs

def _loss(self,y_,predicts,lengths,max_length):

    predicts = tf.reshape(predicts,shape=(-1,max_length,2))
    masks = tf.sequence_mask(lengths,maxlen=max_length)
    masks = tf.cast(masks,dtype=tf.float32)
    masks = tf.expand_dims(masks,axis=2)
    masks = tf.tile(masks,multiples=[1,1,2])
    predicts = tf.multiply(predicts,masks)
    predicts = tf.reshape(predicts,shape=(-1,2))
    xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_,logits=predicts)
    loss = tf.reduce_mean(xentropy,name="loss")

    return loss

def _construct_graph(self,n_outputs):

    if self.randome_state:
        tf.set_random_seed(self.randome_state)
        np.random.seed(self.randome_state)

    X_heads = tf.placeholder(tf.string,shape=[None],name="X_heads")
    X_bodies = tf.placeholder(tf.string,shape=[None],name="X_bodies")
    lengths = tf.placeholder(tf.int32,shape=[None],name="b_lengths")
    max_length = tf.placeholder(tf.int32,shape=(),name="max_length")
    y_ = tf.placeholder(tf.int32,shape=[None],name="y")
    y_one_hot = tf.one_hot(y_,n_outputs,on_value=1.0,off_value=0.0,axis=-1,dtype=tf.float32)
    # ratios = tf.constant(ratios,tf.float32)

    embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder/1",name="pre_trained_embeddings",trainable=True)

    if self.batch_norm_momentum or self.dropout_rate:
        self._training = tf.placeholder_with_default(False,shape=[],name="training")
        self.keep_prob = tf.cond(self._training, lambda: tf.constant(1-self.dropout_rate), lambda: tf.constant(1.0))
    else:
        self._training = None

    pre_output = self._sent_network(embed,X_heads,X_bodies,max_length)
    logits = tf.layers.dense(pre_output,n_outputs,kernel_initializer=he_init,name="logits")
    probabilities = tf.nn.softmax(logits,name="probabilities")
    loss = self._loss(y_=y_one_hot,predicts=logits,lengths=lengths,max_length=max_length)

    optimizer = self.optimizer(learning_rate=self.learning_rate)
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        training_op = optimizer.minimize(loss)

    correct = tf.nn.in_top_k(logits,y_,1)
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32),name="accuracy")
    _,predicts = tf.nn.top_k(logits,k=1,sorted=False)
    confusion_matrix = tf.confusion_matrix(y_,predicts,num_classes=n_outputs,name="confusion_matrix")

    init = [tf.global_variables_initializer(),tf.tables_initializer()]
    saver = tf.train.Saver()

    if self.tensorboard_logdir:
        now = datetime.utcnow().strftime('%Y%m%d-%H%M%S')
        tb_logdir = self.tensorboard_logdir+"/run{}".format(now)
        cost_summary = tf.summary.scalar("validation_loss",loss)
        acc_summary = tf.summary.scalar("validation_accuracy",accuracy)
        merged_summary = tf.summary.merge_all()
        file_writer = tf.summary.FileWriter(tb_logdir,tf.get_default_graph())

        self._merged_summary = merged_summary
        self._file_writer = file_writer

    self._X_head,self._X_body,self._max_length,self.y = X_heads,X_bodies,max_length,y_
    self._lengths = lengths
    self._logits = logits
    self._probabilites = probabilities
    self._loss = loss
    self._training_op = training_op
    self._accuracy = accuracy
    self._confusion_matrix = confusion_matrix
    self._init,self._saver = init,saver

I have printed the objgraph.show_growth() at the end of each epoch, only at first epoch there is a list of objects creation, and second epoch only one weakref. Then there is no object creation any more.

tuple 450244 +450244 list 336249 +336249 dict 86754 +86754 function 35765 +35765 Tensor 23787 +23787 TensorShape 23754 +23754 Operation 21137 +21137 Dimension 20000 +20000 weakref 7131 +7131 Argument 5225 +5225

Could you point out where I am wrong.

Thank you very much.

Best regards, Hao

vbardiovskyg commented 6 years ago

Hi,

you are right that the calling the apply function on the module extends the graph. This is why one should only call apply function only during graph building.

In your example sentence_batch = embedder(sentences) is outside of the graph.

You can reorganize your code to be consistent with https://groups.google.com/a/tensorflow.org/forum/#!msg/hub/ONjFEdpJFp8/si6Vpk_QAwAJ :

g = tf.Graph() with g.as_default(): input_sentences = tf.placeholder(tf.string, shape=[None]) embedder = tf_hub.Module(" https://tfhub.dev/google/universal-sentence-encoder/1") sentence_batch = embedder(input_sentences) table_op = tf.tables_initializer() var_op = tf.global_variables_initializer() g.finalize() # You can already finalize here, not through session.graph. It does not make a real difference, but is cleaner.

with tf.Session(graph=g) as session: session.run([var_op, table_op])

for sentences in X: embeddings = session.run(sentence_batch, feed_dict={input_sentences: sentences})

This doesn't prove that there isn't a memory leak on GPU, just shows that your example can be reorganized to not bloat the graph.

On Sun, May 20, 2018 at 10:36 PM Jonathan Foley notifications@github.com wrote:

@vbardiovskyg https://github.com/vbardiovskyg The universal sentence encoder does in fact lead to a monotonic increase in memory usage when used inside an inference loop. The reason as far as I can tell is that the call method on the module adds ops to the graph rather than re-using existing ones. If you freeze the graph, subsequent calls to embedder(sentences) will throw an exception, which provides evidence for this. Destroying and re-creating the graph with each iteration in inefficient. I don't know if this is unique to the USE or a general problem with the Module class and how it interacts with the Graph.

`def embedding_worker(worker_id): zoltar_logging.init("emebeddingworkers%i.log" % worker_id) logging.info("worker %i starting", worker_id) r_client = StrictRedis() embedder = tf_hub.Module(" https://tfhub.dev/google/universal-sentence-encoder/1") table_op = tf.tables_initializer() var_op = tf.global_variables_initializer() with tf.Session() as session: session.run([var_op, table_op]) session.graph.finalize() update_batch = [] for job_batch in minibatch(batch_generator(worker_id), BATCH_SIZE): ids, sentences = zip(*job_batch) t_s = time() sentence_batch = embedder(sentences) print "after creating tensor" print h.heap() logging.info("worker %i running inference", worker_id) embeddings = session.run(sentence_batch) print "after running session" print h.heap() logging.info('worker %i batch inference took: %.2f', worker_id, time() - t_s) update_embeddings(r_client, update_batch, zip(ids, embeddings), worker_id) print "after updating batch" print h.heap() if update_batch: COLLECTIONS['job_strings'].bulk_write(update_batch)

logging.info("worker %i exiting", worker_id)`

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/hub/issues/45#issuecomment-390510537, or mute the thread https://github.com/notifications/unsubscribe-auth/AjrXjlkvSvNk-RQVvYR4eC0dXsEDxyVdks5t0dPWgaJpZM4Tu_Po .

-- Vojtech

tensorflow / hub

Virtual memory clean up when use gpu version #45