Inconsistency when running the keras-io/examples/timeseries/timeseries_classification_transformer.py

condor-cp commented 1 month ago

Running the example shows inconsistency in number of parameters and model performance compared to what is displayed.

It seems that the global average pooling should take data_format to "channel_first" to reach the same number of parameters and the accuracy performance consistent with the displayed console log (tried with google colab).

https://github.com/keras-team/keras-io/blob/2ac94c4c4da54c1f7c80fea7f18832fd54c6be28/examples/timeseries/timeseries_classification_transformer.py#L115

But then there is no pooling, just removing the feature dimension => Maybe another layer should be used.

mw66 commented 2 days ago

Experienced the same problem:

with

    x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)

the number of parameters are (only show the last few rows that differ):

...                                                                                                  
 global_average_pooling1d (Glob  (None, 1)           0           ['tf.__operators__.add_7[0][0]'] 
 alAveragePooling1D)                                                                              

 dense (Dense)                  (None, 128)          256         ['global_average_pooling1d[0][0]'
                                                                 ]                                

 dropout_8 (Dropout)            (None, 128)          0           ['dense[0][0]']                  

 dense_1 (Dense)                (None, 2)            258         ['dropout_8[0][0]']              

==================================================================================================
Total params: 29,258
Trainable params: 29,258
Non-trainable params: 0

And the training stop very quickly with bad result:

45/45 [==============================] - 24s 545ms/step - loss: 0.6922 - sparse_categorical_accuracy: 0.5208 - val_loss: 0.6952 - val_sparse_categorical_accuracy: 0.4799
42/42 [==============================] - 4s 91ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5159

While on page: https://keras.io/examples/timeseries/timeseries_classification_transformer/

it shows:

│ global_average_poo… │ (None, 500)       │       0 │ add_7[0][0]          │
│ (GlobalAveragePool… │                   │         │                      │
├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
│ dense (Dense)       │ (None, 128)       │  64,128 │ global_average_pool… │
├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
│ dropout_12          │ (None, 128)       │       0 │ dense[0][0]          │
│ (Dropout)           │                   │         │                      │
├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
│ dense_1 (Dense)     │ (None, 2)         │     258 │ dropout_12[0][0]     │
└─────────────────────┴───────────────────┴─────────┴──────────────────────┘
 Total params: 93,130 (363.79 KB)
 Trainable params: 93,130 (363.79 KB)
 Non-trainable params: 0 (0.00 B)

mw66 commented 2 days ago

Hi , @fchollet

https://github.com/keras-team/keras-io/blob/8ec940273419cd5a61f056f462753ba4ac0c9590/examples/timeseries/timeseries_classification_transformer.py#L115

This line need to be changed to channel_first to see the good result on:

https://keras.io/examples/timeseries/timeseries_classification_transformer/

Can you make this change? can also add explanation why channel_first is need instead of channels_last?

The input training data's shape is (3601, 500, 1), i.e channels_last for sure; but why we need to set channel_first to see the good training result?

Thanks.

keras-team / keras-io

Inconsistency when running the keras-io/examples/timeseries/timeseries_classification_transformer.py #1908