tensorflow / profiler

A profiling and performance analysis tool for TensorFlow
Apache License 2.0
359 stars 55 forks source link

Memory breakdown table : invalid data type, no region type and no shape for operations in the memory profile tab #255

Open RocaVincent opened 3 years ago

RocaVincent commented 3 years ago

Hi,

I use the profiler with Tensorflow 2 and a NVIDIA Quadrio RTX 6000 GPU and when I inspect the memory usage with my model, I get some operations in the memory breakdown table in the memory profile tab with invalid data type, no shape and no region type. I'm wondering if it's normal or if I have to care about this for my model. Above I put a simple code which reproduces this kind of error.

import tensorflow as tf
import keras

IMAGE_SHAPE = [256,256,3]

def Discriminator():
    return keras.Sequential([
        keras.layers.Flatten(input_shape=IMAGE_SHAPE),
        keras.layers.Dense(1, activation="sigmoid")
    ])

def Generator():
    return keras.Sequential([
        keras.layers.Conv2D(filters=IMAGE_SHAPE[-1], kernel_size=3, strides=1, padding="same", use_bias=False,
                           input_shape=IMAGE_SHAPE)
    ])

generator_BtoA = Generator()
discriminator_A = Discriminator()

loss_obj = keras.losses.MeanSquaredError()

discriminator_A_optimizer = keras.optimizers.Adam(0.0002)

BATCH_SIZE = 32

@tf.function
def train_step():
    # entrainement discriminateurs
    imagesA = tf.random.uniform([BATCH_SIZE]+IMAGE_SHAPE)
    imagesB = tf.random.uniform([BATCH_SIZE]+IMAGE_SHAPE)
    fakesA = generator_BtoA(imagesB, training=False)
    with tf.GradientTape(persistent=True) as tape:
        disc_fakesA = discriminator_A(fakesA, training=True)
        discA_loss = loss_obj(tf.zeros_like(disc_fakesA), disc_fakesA)
    gradients_discA = tape.gradient(discA_loss, discriminator_A.trainable_variables)
    discriminator_A_optimizer.apply_gradients(zip(gradients_discA, discriminator_A.trainable_variables))

from tensorflow.profiler.experimental import Trace as Trace_profiler, start as start_profiler, stop as stop_profiler

start_profiler("toy_logdir/")
with Trace_profiler("train", step_num=1, _r=-1):
    train_step()
stop_profiler()

With this code, I get the following results in the memory profile tab :

Op Name Allocation Size (GiBs) Requested Size (GiBs) Occurrences Region type Data type Shape
sequential/conv2d/Conv2D 0.227 0.227 1   INVALID
sequential/conv2d/Conv2D 0.039 0.039 1   INVALID
sequential/conv2d/Conv2D 0.023 0.023 1 output float [32,3,256,256]
sequential/conv2d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer 0.023 0.023 1 output float [32,3,256,256]

Do you know why I get this kind of strange results ?

ckluk-github commented 3 years ago

Thanks for reporting and especially for the reproducer.

I think the two memory allocations with INVALID data type are real and actually consume memory. They may be caused by the implementation of Conv2D, and for some unknown reason, their (region type, data type, shape) cannot be inferred. I have filed a bug internally to further investigate.

Terranlee commented 3 years ago

Hi RocaVincent, In your code to reproduce this issue, you have imagesA = tf.random.uniform([BATCH_SIZE]+IMAGE_SHAPE) but imagesA is not used by any code later.

I think the invalid data type in memory breakdown table is related to this unused code. This unused code causes tf unable to infer the data type. when I remove this line, the memory allocation table looks normal:

Op Name Allocation Size (GiBs) Requested Size (GiBs) Occurrences Region type Data type Shape
sequential_2/conv2d_1/Conv2D 0.125 0.105 1 temp uint8 [113247504]
sequential_2/conv2d_1/Conv2D 0.031 0.023 1 output float [32,3,256,256]
sequential_2/conv2d_1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer 0.031 0.023 1 output float [32,3,256,256]
preallocated/unknown 0.002 0.002 1 persist/dynamic INVALID unknown
RocaVincent commented 3 years ago

Hi @Terranlee Thank you for the answer

When I remove this line, I get the exact same memory breakdown table than the first I show. In addition, the bug concerns convolutions operations which are not linked with this tensor creation. What surprises me also is that you have a uint8 data type for your first operation.

tinducvo commented 3 years ago

Hi @Terranlee Thank you for the answer

When I remove this line, I get the exact same memory breakdown table than the first I show. In addition, the bug concerns convolutions operations which are not linked with this tensor creation. What surprises me also is that you have a uint8 data type for your first operation.

I'm a user, not a contributor, but if you end up needing more memory, my current workaround is to explicitly limit the memory usage to a precalculated model size estimate

RocaVincent commented 3 years ago

Hi @parkournerd This solution doesn't increase your GPU capacity and doesn't explain the strange results given by the Profiler.