keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.99k stars 19.48k forks source link

model.summary() broken for custom models subclassed from keras.Model #19535

Closed eschmitt88 closed 4 months ago

eschmitt88 commented 6 months ago

Current behavior?

Custom model classes built from keras.Model do not think they get built properly, and the model.summary() is missing information. However, the model will run just fine. In keras version 2.15.0, we see it working properly, for example (from "code to reproduce," taken exactly from keras documentation), the output is as expected:

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               multiple                  352       

 dense_1 (Dense)             multiple                  165       

=================================================================
Total params: 517 (2.02 KB)
Trainable params: 517 (2.02 KB)
Non-trainable params: 0 (0.00 Byte)

In keras 3.2.1 and keras-nightly (colab), we instead see this:

/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py:360: UserWarning: `build()` was called 
on layer 'my_model', however the layer does not have a `build()` method implemented and it looks like 
it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which 
may cause failures down the line. Make sure to implement a proper `build()` method.
  warnings.warn(
Model: "my_model"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ ?                           │     0 (unbuilt) │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 0 (0.00 B)
 Trainable params: 0 (0.00 B)
 Non-trainable params: 0 (0.00 B)

While it doesn't break model training and inference, I still think it's an important issue, because I often rely on the model.summary() to check my work as I develop. Thank you to whoever helps out.

Standalone code to reproduce the issue

import keras

class MyModel(keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = keras.layers.Dense(32, activation="relu")
        self.dense2 = keras.layers.Dense(5, activation="softmax")

    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

model = MyModel()
model.build(input_shape=(None, 10))
model.summary()

Relevant log output

(repeat from above)

/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py:360: UserWarning: `build()` was called 
on layer 'my_model', however the layer does not have a `build()` method implemented and it looks like 
it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which 
may cause failures down the line. Make sure to implement a proper `build()` method.
  warnings.warn(
Model: "my_model"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ ?                           │     0 (unbuilt) │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 0 (0.00 B)
 Trainable params: 0 (0.00 B)
 Non-trainable params: 0 (0.00 B)
fchollet commented 6 months ago

the layer does not have a build() method implemented and it looks like it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which may cause failures down the line. Make sure to implement a proper build() method.

As indicated by this message, you need to implement a build() method, e.g.

class MyModel(keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = keras.layers.Dense(32, activation="relu")
        self.dense2 = keras.layers.Dense(5, activation="softmax")

    def build(self, input_shape):
        self.dense1.build(input_shape)
        input_shape = self.dense1.compute_output_shape(input_shape)
        self.dense2.build(input_shape)
        self.built = True

    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

You could also just build your model before using by calling it on a batch of data before you start using it. Which is also a strategy you can apply in build() to build the model.

SarthakNikhal commented 6 months ago

@sachinprasadhs Can I help with this issue

eschmitt88 commented 6 months ago

@fchollet thanks for the tip! I wonder, perhaps we could throw what you have there into the documentation for subclassing the model class? I'm curious why keras 2.15.0 seemed to not require a custom build() function.

DLumi commented 5 months ago

perhaps we could throw what you have there into the documentation for subclassing the model class?

I second this.

@fchollet And while we at it, could you clarify if having ? as an Output shape of a built model is intended? It seems super minor as everything seems to be working just fine, but it's been bugging me out. Plus since the summary utility looks at layer._inbound_nodes to assign that info, I'm concerned that the layers might not be connected properly due to that.

I've made a short notebook for reproduction (basically, it's your model from the example above): https://colab.research.google.com/drive/1HVrm9yyStskvRniPFCOeOAPdWPVZYZtg

Mygit123abc commented 5 months ago

the layer does not have a build() method implemented and it looks like it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which may cause failures down the line. Make sure to implement a proper build() method.

As indicated by this message, you need to implement a build() method, e.g.

class MyModel(keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = keras.layers.Dense(32, activation="relu")
        self.dense2 = keras.layers.Dense(5, activation="softmax")

    def build(self, input_shape):
        self.dense1.build(input_shape)
        input_shape = self.dense1.compute_output_shape(input_shape)
        self.dense2.build(input_shape)
        self.built = True

    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

You could also just build your model before using by calling it on a batch of data before you start using it. Which is also a strategy you can apply in build() to build the model.

Not working in tf 2.16. This library is so shitty

GuidoBartoli commented 5 months ago

I had the same issue with TF 2.16 while using Transfer Learning on a MobileNet V3 and I solved simply calling build() before summary().

size = 224
chans = 3
model.build((None, size, size, chans)
print(model.summary(line_length=88, show_trainable=True))
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Layer (type)                     ┃ Output Shape              ┃      Param # ┃ Train… ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━┩
│ MobilenetV3large (Functional)    │ (None, 7, 7, 960)         │    2,996,352 │   N    │
├──────────────────────────────────┼───────────────────────────┼──────────────┼────────┤
│ flatten (Flatten)                │ (None, 47040)             │            0 │   -    │
├──────────────────────────────────┼───────────────────────────┼──────────────┼────────┤
│ dropout (Dropout)                │ (None, 47040)             │            0 │   -    │
├──────────────────────────────────┼───────────────────────────┼──────────────┼────────┤
│ dense (Dense)                    │ (None, 1)                 │       47,041 │   Y    │
└──────────────────────────────────┴───────────────────────────┴──────────────┴────────┘
 Total params: 3,043,393 (11.61 MB)
 Trainable params: 47,041 (183.75 KB)
 Non-trainable params: 2,996,352 (11.43 MB)

PS: I confirm that the training of the last level still works even when the output of summary() was incorrect

DLumi commented 5 months ago

I had the same issue with TF 2.16 while using Transfer Learning on a MobileNet V3 and I solved simply calling build() before summary().

size = 224
chans = 3
model.build((None, size, size, chans)
print(model.summary(line_length=88, show_trainable=True))
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Layer (type)                     ┃ Output Shape              ┃      Param # ┃ Train… ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━┩
│ MobilenetV3large (Functional)    │ (None, 7, 7, 960)         │    2,996,352 │   N    │
├──────────────────────────────────┼───────────────────────────┼──────────────┼────────┤
│ flatten (Flatten)                │ (None, 47040)             │            0 │   -    │
├──────────────────────────────────┼───────────────────────────┼──────────────┼────────┤
│ dropout (Dropout)                │ (None, 47040)             │            0 │   -    │
├──────────────────────────────────┼───────────────────────────┼──────────────┼────────┤
│ dense (Dense)                    │ (None, 1)                 │       47,041 │   Y    │
└──────────────────────────────────┴───────────────────────────┴──────────────┴────────┘
 Total params: 3,043,393 (11.61 MB)
 Trainable params: 47,041 (183.75 KB)
 Non-trainable params: 2,996,352 (11.43 MB)

PS: I confirm that the training of the last level still works even when the output of summary() was incorrect

If you take a look at my colab notebook above, I provide an example where explicitly calling build does not solve the problem of unknown shapes (marked as ?). While the model seems to be working fine, this is a visualization bug that I want the team to address in the future

google-ml-butler[bot] commented 4 months ago

Are you satisfied with the resolution of your issue? Yes No

mehtankit commented 3 months ago

Model is unbuilt as it doesn't know what is the input value, you can use input layer for that or use the previous version of tensorflow

ghsanti commented 3 months ago

Correct @DLumi. It was fixed in 3.4.1.

For 3.4.1, I've simplified the build:

import keras
class MyModel(keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = keras.layers.Dense(32, activation="relu")
        self.dense2 = keras.layers.Dense(5, activation="softmax")

    def build(self, input_shape):
        # call using random input 
        self.call(keras.random.normal(input_shape))
        self.built = True

    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

m = MyModel()
m.build((1,2,3))
m.summary()

when it's called the first time, all layers are built.