To see it happening, just add memory-profiler and decorate train_step_fn with @profile.
The leak happens regardless of whether execution occurs with a GradientTape.
4. Expected behavior
RAM usage should not increase between calls to predict.
5. Additional context
Here's a sample memory-profiler dump from my own version of the notebook (not identical to the one linked to above):
Line # Mem usage Increment Occurences Line Contents
============================================================
142 1549.4 MiB 1549.4 MiB 1 @profile
143 def train_step_fn(image_tensors,
144 groundtruth_boxes_list,
145 groundtruth_classes_list):
146 """A single training iteration.
147
148 Args:
149 image_tensors: A list of [1, height, width, 3] Tensor of type tf.float32.
150 Note that the height and width can vary across images, as they are
151 reshaped within this function to be 320x320.
152 groundtruth_boxes_list: A list of Tensors of shape [N_i, 4] with type
153 tf.float32 representing groundtruth boxes for each image in the batch.
154 groundtruth_classes_list: A list of Tensors of shape [N_i, num_classes]
155 with type tf.float32 representing groundtruth boxes for each image in
156 the batch.
157
158 Returns:
159 A scalar tensor representing the total loss for the input batch.
160 """
161 1549.4 MiB 0.0 MiB 1 shapes = tf.constant(len(image_tensors) * [[320, 320, 3]], dtype=tf.int32)
162 1549.4 MiB 0.0 MiB 1 model.provide_groundtruth(
163 1549.4 MiB 0.0 MiB 1 groundtruth_boxes_list=groundtruth_boxes_list,
164 1549.4 MiB 0.0 MiB 1 groundtruth_classes_list=groundtruth_classes_list)
165 # The images each have a pointless batch dimension of 1, so do a reshape
166 # to remove this from the result of concatenation
167 1549.4 MiB 0.0 MiB 1 concatted = tf.reshape(tf.concat(image_tensors, axis=0), (len(image_tensors), 320, 320, 3))
168 1737.8 MiB 188.4 MiB 1 prediction_dict = model.predict(concatted, shapes)
169 1737.8 MiB 0.0 MiB 1 losses_dict = model.loss(prediction_dict, shapes)
170 1737.8 MiB 0.0 MiB 1 total_loss = losses_dict['Loss/localization_loss'] + losses_dict['Loss/classification_loss']
180 1737.8 MiB 0.0 MiB 1 return total_loss
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/38f1ebe031544418a36edda387c9234145480e53/research/object_detection/meta_architectures/ssd_meta_arch.py#L525
2. Describe the bug
The 'predict' method of SSDMetaArch has a memory leak. It's leaking 100-200MB for each batch of 32 320x320 images during my training loop.
3. Steps to reproduce
The leak can be observed in this official tutorial:
https://github.com/tensorflow/models/blob/38f1ebe031544418a36edda387c9234145480e53/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb
To see it happening, just add memory-profiler and decorate
train_step_fn
with@profile
.The leak happens regardless of whether execution occurs with a
GradientTape
.4. Expected behavior
RAM usage should not increase between calls to
predict
.5. Additional context
Here's a sample
memory-profiler
dump from my own version of the notebook (not identical to the one linked to above):6. System information
Dockerfile
tensorflow/tensorflow:2.4.1-gpu