Loss stops changing with custom class

I made a custom class for classification instead of forecasting, taking out the decoder. I also added static features (including categorical).

Whenever I run this model, at some epoch (depending on optimizer, etc.) the validation loss stays the same, and the next epoch neither validation nor training loss change. Below is an extreme example where this immediately (sometimes it takes 2-3 epochs):

Epoch 1/100
135/135 - 95s - 703ms/step - f1_score: 0.0275 - loss: 12.8403 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 0.0010
Epoch 2/100
135/135 - 27s - 200ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 0.0010
Epoch 3/100
135/135 - 26s - 195ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 0.0010
Epoch 4/100
135/135 - 26s - 193ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 0.0010
Epoch 5/100
135/135 - 26s - 192ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 2.5000e-04
Epoch 6/100
135/135 - 26s - 189ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 2.5000e-04
Epoch 7/100
135/135 - 26s - 193ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 2.5000e-04
Epoch 8/100
135/135 - 25s - 188ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 6.2500e-05
Epoch 9/100
135/135 - 26s - 192ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 6.2500e-05
Epoch 10/100
135/135 - 27s - 196ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 6.2500e-05
Epoch 11/100
135/135 - 26s - 191ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 2.5000e-05
Epoch 12/100
135/135 - 25s - 189ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 2.5000e-05
Epoch 13/100
135/135 - 26s - 196ms/step - f1_score: 0.0214 - loss: 13.8861 - val_f1_score: 0.0201 - val_loss: 14.1408 - learning_rate: 2.5000e-05
10/11 ━━━━━━━━━━━━━━━━━━━━ 0s 94ms/step

This is my classifier:

def TKAT_classify(X: pd.DataFrame, num_embedding: int, num_hidden: int, num_heads: int, n_classes: int,
                  use_tkan: bool = True, filters=32, strides=16):
    X = X.copy()

    cat_cols = X.columns[X.dtypes == 'category']

    # X has dynamic (time series), numerical (static), and categorical (static) features.
    # Get the embedding size for the categorical features
    num_dynamic_features = X[X.columns[X.dtypes == object]].shape[1]
    num_static_features = X[X.columns[(X.dtypes != object) & (X.dtypes != "category")]].shape[1]
    num_categorical_features = X[cat_cols].shape[1]

    # assert num_dynamic_features == 18 and num_static_features == 5 and num_categorical_features == 3

    cat_embed_dict = {col: build_width(X[col].nunique())[-1] for col in cat_cols}

    # assign a unique integer to each category
    cat2int = {col: {cat: i for i, cat in enumerate(X[col].unique(), 1)}
               for col in X[cat_cols]}
    for col in cat2int:
        cat2int[col][np.nan] = 0

    # create embedding layers for each categorical feature
    categorical_embedding = {
        col: Embedding(X[col].nunique() + 1, cat_embed_dict[col], name=f'embedding_{col}')
        for col in cat_cols}

    dynamic_inputs = Input(shape=(len(X.iloc[0, 0]), num_dynamic_features))
    static_inputs = Input(shape=(num_static_features,))
    categorical_inputs = [Input(shape=(1,), dtype=tf.int32) for _ in range(num_categorical_features)]

    # First, convolutional layer to reduce the number of time steps
    conv_inputs = Conv1D(filters, 3 * strides, strides=strides, padding="same", activation='silu')(dynamic_inputs)

    dynamic_embedding = EmbeddingLayer(num_embedding)(conv_inputs)

    variable_selection = VariableSelectionNetwork(num_hidden, name='vsn_past_features')(dynamic_embedding)

    # recurrent encoder
    encode_out, *encode_states = RecurrentLayer(num_hidden, return_state=True, use_tkan=use_tkan, name='encoder')(
        variable_selection)

    # feed forward
    all_context = AddAndNorm()([Gate()(encode_out), variable_selection])

    # GRN using TKAN before attention
    enriched = GRN(num_hidden)(all_context)

    # attention
    attention_output = MultiHeadAttention(num_heads=num_heads, key_dim=enriched.shape[-1]
                                          )(enriched, enriched, enriched)
    attention_flattened = KANLinear(num_hidden)(Flatten()(attention_output))

    # Flatten the attention output and predict the future sequence
    # concatenate the flattened attention output with the static output and the categorical embeddings
    flattened_output = Concatenate()([attention_flattened, static_inputs] +
                                     [Flatten()(categorical_embedding[col](categorical_inputs[i]))
                                      for i, col in enumerate(cat_cols)])
    dense_output = KANLinear(n_classes)(flattened_output)

    return Model(inputs=[dynamic_inputs, static_inputs, *categorical_inputs], outputs=dense_output), cat2int

I don't know if this issue is particular to my data, or reflects an issue with how the grid is updated in the KAN layers.

remigenet / TKAT

Loss stops changing with custom class #3