Open condor-cp opened 1 month ago
Experienced the same problem:
with
x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
the number of parameters are (only show the last few rows that differ):
...
global_average_pooling1d (Glob (None, 1) 0 ['tf.__operators__.add_7[0][0]']
alAveragePooling1D)
dense (Dense) (None, 128) 256 ['global_average_pooling1d[0][0]'
]
dropout_8 (Dropout) (None, 128) 0 ['dense[0][0]']
dense_1 (Dense) (None, 2) 258 ['dropout_8[0][0]']
==================================================================================================
Total params: 29,258
Trainable params: 29,258
Non-trainable params: 0
And the training stop very quickly with bad result:
45/45 [==============================] - 24s 545ms/step - loss: 0.6922 - sparse_categorical_accuracy: 0.5208 - val_loss: 0.6952 - val_sparse_categorical_accuracy: 0.4799
42/42 [==============================] - 4s 91ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5159
While on page: https://keras.io/examples/timeseries/timeseries_classification_transformer/
it shows:
│ global_average_poo… │ (None, 500) │ 0 │ add_7[0][0] │
│ (GlobalAveragePool… │ │ │ │
├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
│ dense (Dense) │ (None, 128) │ 64,128 │ global_average_pool… │
├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
│ dropout_12 │ (None, 128) │ 0 │ dense[0][0] │
│ (Dropout) │ │ │ │
├─────────────────────┼───────────────────┼─────────┼──────────────────────┤
│ dense_1 (Dense) │ (None, 2) │ 258 │ dropout_12[0][0] │
└─────────────────────┴───────────────────┴─────────┴──────────────────────┘
Total params: 93,130 (363.79 KB)
Trainable params: 93,130 (363.79 KB)
Non-trainable params: 0 (0.00 B)
Hi , @fchollet
This line need to be changed to channel_first
to see the good result on:
https://keras.io/examples/timeseries/timeseries_classification_transformer/
Can you make this change? can also add explanation why channel_first
is need instead of channels_last
?
The input training data's shape is (3601, 500, 1)
, i.e channels_last for sure; but why we need to set channel_first
to see the good training result?
Thanks.
Running the example shows inconsistency in number of parameters and model performance compared to what is displayed.
It seems that the global average pooling should take data_format to "channel_first" to reach the same number of parameters and the accuracy performance consistent with the displayed console log (tried with google colab).
https://github.com/keras-team/keras-io/blob/2ac94c4c4da54c1f7c80fea7f18832fd54c6be28/examples/timeseries/timeseries_classification_transformer.py#L115
But then there is no pooling, just removing the feature dimension => Maybe another layer should be used.