Reused variables create huge graphs

cipollone commented 4 years ago

Tensorboard graphs are really useful to check defined ops and the general net structure. However, this is hard to do when variables are reused, because the graph is complicated by many edges that simply propagate shared variables. For example, the graph associated to:

import tensorflow as tf

# Meaningless model
input_A = tf.keras.Input(shape=(1,2), name='Input_A')
input_B = tf.keras.Input(shape=(1,2), name='Input_B')

inner_block = tf.keras.Sequential(
    [ tf.keras.layers.Dense(1) for i in range(50) ])

output = inner_block(input_A) + inner_block(input_B)

model = tf.keras.Model((input_A,input_B), output)

# Write
tf.keras.callbacks.TensorBoard('.', write_graph=True).set_model(model)

is

The 100 tensors in this graph are simply inputs of ReadVariableOps. In more complex graphs, these connections become the thicker arrows, completely changing the logic structure of the net. In

the three generators are the same layer, called three times with different inputs. Here, the real inputs have been even pushed out of the main graph (they come from the ImagePreprocessing layer), because of the stronger connections for variables.

I think that how variables are propagated is mostly a tensorflow internal concern. Even though the graph is logically correct (because actual variable resources are in the first block), I'd like the possibility to hide inputs of ReadVariableOps (and, why not, even control dependencies).

Details: Tensorflow 2.1.0-rc0 built from source.

psybuzz commented 4 years ago

Thanks for filing the report here. I lack some contextual knowledge, but I can see that those 2 examples don't do a great job of showing the net's general structure.

In the 1st example with 2 sequential paths, what does the ideal, expected graph look like? Smaller "sequential" rectangles and thicker arrows with larger font? In the 2nd example, I'm also curious what the ideal graph looks like.

@davidsoergel, would you like to investigate?

cipollone commented 4 years ago

I think the size of boxes and arrows is appropriate. In the first case, name scopes contain 50 layers each; in the second, thick arrows represent relatively large convolution kernels. Instead, I'd like to hide these arrows entirely. About "ideal" graphs, here's one:

# Example 1
generatorA = Generator()
generatorB = Generator()

inputs = tf.keras.Input(shape=(256,256,3), name='Input')
out = generatorA(inputs)
out = generatorB(out)
out = tf.identity(out, name='output')

model = tf.keras.Model(inputs, out)

Using the same block twice instead:

# Example 2
generatorA = Generator()

inputs = tf.keras.Input(shape=(256,256,3), name='Input')
out = generatorA(inputs)
out = generatorA(out)
out = tf.identity(out, name='output')

model = tf.keras.Model(inputs, out)

where the real output is packed with other 48 variables. The situation is even more problematic when subclassing:

# Example 3
class OuterLayer(tf.keras.layers.Layer):

  def build(self, input_shape):
    self.generator = Generator()
    self.built = True

  def call(self, inputs):
    out = self.generator(inputs)
    out = self.generator(out)
    out = tf.identity(out, name='output')
    return out

inputs = tf.keras.Input(shape=(256,256,3), name='Input')
out = OuterLayer()(inputs)

Apart from the huge control dependencies of "Identity", there is a thick connection between the two Generator blocks.

I've seen that in Example 2 edges are scalars that come from "resource" nodes inside variables, while in Example 3 they are actual values coming from "(ReadVariableOp)".

(Generator is just a custom layer that calls other keras layers, I never add variables myself. As we've seen from the first case, this also applies to built-in Keras layers).

tensorflow / tensorboard

Reused variables create huge graphs #3118