I made a custom class for classification instead of forecasting, taking out the decoder. I also added static features (including categorical).
Whenever I run this model, at some epoch (depending on optimizer, etc.) the validation loss stays the same, and the next epoch neither validation nor training loss change. Below is an extreme example where this immediately (sometimes it takes 2-3 epochs):
def TKAT_classify(X: pd.DataFrame, num_embedding: int, num_hidden: int, num_heads: int, n_classes: int,
use_tkan: bool = True, filters=32, strides=16):
X = X.copy()
cat_cols = X.columns[X.dtypes == 'category']
# X has dynamic (time series), numerical (static), and categorical (static) features.
# Get the embedding size for the categorical features
num_dynamic_features = X[X.columns[X.dtypes == object]].shape[1]
num_static_features = X[X.columns[(X.dtypes != object) & (X.dtypes != "category")]].shape[1]
num_categorical_features = X[cat_cols].shape[1]
# assert num_dynamic_features == 18 and num_static_features == 5 and num_categorical_features == 3
cat_embed_dict = {col: build_width(X[col].nunique())[-1] for col in cat_cols}
# assign a unique integer to each category
cat2int = {col: {cat: i for i, cat in enumerate(X[col].unique(), 1)}
for col in X[cat_cols]}
for col in cat2int:
cat2int[col][np.nan] = 0
# create embedding layers for each categorical feature
categorical_embedding = {
col: Embedding(X[col].nunique() + 1, cat_embed_dict[col], name=f'embedding_{col}')
for col in cat_cols}
dynamic_inputs = Input(shape=(len(X.iloc[0, 0]), num_dynamic_features))
static_inputs = Input(shape=(num_static_features,))
categorical_inputs = [Input(shape=(1,), dtype=tf.int32) for _ in range(num_categorical_features)]
# First, convolutional layer to reduce the number of time steps
conv_inputs = Conv1D(filters, 3 * strides, strides=strides, padding="same", activation='silu')(dynamic_inputs)
dynamic_embedding = EmbeddingLayer(num_embedding)(conv_inputs)
variable_selection = VariableSelectionNetwork(num_hidden, name='vsn_past_features')(dynamic_embedding)
# recurrent encoder
encode_out, *encode_states = RecurrentLayer(num_hidden, return_state=True, use_tkan=use_tkan, name='encoder')(
variable_selection)
# feed forward
all_context = AddAndNorm()([Gate()(encode_out), variable_selection])
# GRN using TKAN before attention
enriched = GRN(num_hidden)(all_context)
# attention
attention_output = MultiHeadAttention(num_heads=num_heads, key_dim=enriched.shape[-1]
)(enriched, enriched, enriched)
attention_flattened = KANLinear(num_hidden)(Flatten()(attention_output))
# Flatten the attention output and predict the future sequence
# concatenate the flattened attention output with the static output and the categorical embeddings
flattened_output = Concatenate()([attention_flattened, static_inputs] +
[Flatten()(categorical_embedding[col](categorical_inputs[i]))
for i, col in enumerate(cat_cols)])
dense_output = KANLinear(n_classes)(flattened_output)
return Model(inputs=[dynamic_inputs, static_inputs, *categorical_inputs], outputs=dense_output), cat2int
I don't know if this issue is particular to my data, or reflects an issue with how the grid is updated in the KAN layers.
Well I don't think it comes from the KAN part, but maybe just try to replace your KANLinear by Dense layers and specify use_tkan to False to see if it comes from this ?!
I made a custom class for classification instead of forecasting, taking out the decoder. I also added static features (including categorical).
Whenever I run this model, at some epoch (depending on optimizer, etc.) the validation loss stays the same, and the next epoch neither validation nor training loss change. Below is an extreme example where this immediately (sometimes it takes 2-3 epochs):
This is my classifier:
I don't know if this issue is particular to my data, or reflects an issue with how the grid is updated in the KAN layers.