Error on model.summary() with PaliGemma

bebechien commented 1 month ago

Describe the bug

model.summary() for PaliGemma will fail when passing show_trainable=True\ Works fine without the parameter.\ Below is an actual error message.

Preprocessor: "pali_gemma_causal_lm_preprocessor"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Tokenizer (type)                                   ┃                                             Vocab # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ pali_gemma_tokenizer (PaliGemmaTokenizer)          │                                             257,152 │
└────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┘
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-2-0216a872cf53>](https://localhost:8080/#) in <cell line: 1>()
----> 1 gemma_lm.summary(show_trainable=True)

2 frames
[/usr/local/lib/python3.10/dist-packages/keras/src/utils/summary_utils.py](https://localhost:8080/#) in get_layer_fields(layer, prefix)
    262             fields.append(get_connections(layer))
    263         if show_trainable:
--> 264             if layer.weights:
    265                 fields.append(
    266                     bold_text("Y", color=34)

AttributeError: 'GetItem' object has no attribute 'weights'

To Reproduce https://colab.research.google.com/drive/1o04v1Xok9892KC-TUOuYh8ToU2wY8RPY?usp=sharing

Expected behavior Display the model summary with Trainable info.

Additional context N/A

Would you like to help us fix it? Sure

chunduriv commented 1 month ago

cc: @mattdangerw

miticollo commented 3 weeks ago

This is a semantic issue.

AttributeError is raised here because a keras.ops (like GetItem) doesn't have the weights property. In Keras, a Layer is an Operation, but vice versa is not true. Anyway, a Functional model can also include raw Keras 3 ops. Maybe, a dirty workaround could be to add the weights property to Operation class:

@property
def weights(self):
    weights = []
    return weights

IMO this solution is not semantically correct because weights for an operation don't make any sense.

Or an alternative solution could be to wrap the keras.ops into a Lambda layer. I prefer avoiding this layer because of its limitations but maybe in this case it can be used safe.

Finally this issue should be moved in keras repository.

mattdangerw commented 2 weeks ago

Thanks @miticollo. Agreed, this should move to the keras repo.

Could we just have models.layers filter down to only return Layer and not Operation? That's what is used by the summary utils, and generally speaking I think having model.layers return something that is not a Layer is just unexpected.

Not sure if anything would break, but we could try it out. @miticollo are you interested in taking it on?

miticollo commented 2 weeks ago

Thank u for the answer! At the moment I'm a bit busy with the latest exams at University (one is on Deep Learning). So I can't do it. Anyway using a filter it could be a good a idea. But currently summary shows also the keras.ops. So is this behaviour still intended?

mattdangerw commented 2 weeks ago

@miticollo good point. Perhaps?

Still rubs me the wrong way that model.layers would return something that is not a layer. Users should be able to confidently call any layer method (e.g. layer.count_params()) on everything inside model.layers in a loop. Though potentially we should also have model summary plot ops as well as layers, in which case we should not sure model.layers to loop through everything.

For now I will open a bug in core Keras, assign to myself, and mark open for contributions.

keras-team / keras-nlp

Error on model.summary() with PaliGemma #1654