Closed catubc closed 3 years ago
You can refer to the guide here: https://github.com/philipperemy/keras-tcn#how-do-i-choose-the-correct-set-of-parameters-to-configure-my-tcn-layer.
Try to print your receptive field with:
print('Receptive field size =', tcn_layer.receptive_field)
Also print a summary of the TCN and the LSTM and try to have roughly the same number of weights. That will give you an idea if your model is overly parameterized.
And just match it with 450 (aim a bit higher). In your case, I think your RF is very high hence a very unstable training with the amount of data you have.
Thanks so much Philip. I started tweaking some of the parameters, and the training set prediction looks much better. There is much higher variance early (e.g. t=-15sec) than at t=0, so this asymptotic shape is what we'd like to see.
But I'm still getting overfitting (or better yet, lacking generalization) on the test sets. One thing I should mention is that we have very few samples (usually 50-100) and I'm worried that overfitting will be difficult to avoid.
I will keep tweaking, but couple of quick questions that would help along as there are many params:
Our time steps are 450, and RF 1,191. Is that still too much?
Also, it was not completely clear what LSTM you were referring to? For now, I'm only using a tcn + dense layer. Are you suggesting we add an LSTM also? I tested this before, and it wasn't clearly helping.
Any other obvious tweaks?
Here's the initialization code:
# if time_steps > tcn_layer.receptive_field, then we should not
# be able to solve this task.
batch_size, time_steps, input_dim = None, 450, 6
#
dilations = (1, 4, 16, 64)
#
tcn_layer = TCN(nb_filters = 64,
kernel_size=8,
input_shape=(time_steps, input_dim),
return_sequences=True,
dilations = dilations,
dropout_rate=0.5,
use_batch_norm=True,
use_layer_norm=False,
use_weight_norm=False
)
And the model params:
Receptive field size = 1191
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
tcn_1 (TCN) (None, 450, 64) 235456
_________________________________________________________________
dense_1 (Dense) (None, 450, 1) 65
=================================================================
Total params: 235,521
Trainable params: 234,497
Non-trainable params: 1,024
@catubc I think it's going to be difficult to apply TCN without overfitting on your dataset. The time series are very long (465) and you have just 50/100 of them. It's quite challenging!
Our time steps are 450, and RF 1,191. Is that still too much?
Also, it was not completely clear what LSTM you were referring to? For now, I'm only using a tcn + dense layer. Are you suggesting we add an LSTM also? I tested this before, and it wasn't clearly helping.
Any other obvious tweaks?
Hi Philip Thank you again for taking the time to respond. It seems there are some parameters that work better than others. For example, dropout rate seems to have a pretty narrow range (e.g. 0.05-0.1) for some parameter pairs. It also seems like the smaller models don't do well (even on test data), so higher depth/n_params seems to do better (but I've not exhaused the options).
I will try to add regularization - but it seems your TCN module does not inherit the Keras modules, so I guess I can only apply it to the dense layer. But that doesn't sound optimal, I would think we need to constrain the params in the tcn layer, not the dense.
Thanks again for the help!
I get the same problem but maybe for a different reason: https://github.com/philipperemy/keras-tcn/issues/204
Good to know. For my data the last plot above is actually not correct (I accidentally trained on test data due to randomizing step), so I'm back to square 1 about finding params that work...
Describe the bug Hello
This is not a bug, it's more of a question re: use. Sorry for using the "bug" report, but I wasn't what the correct field was.
I'm trying to predict a behavior label (0=no behavior; 1=behavior) from temporal neural data and am having a hard time getting the model to converge. My data has shape [n_samples, n_timesteps, n_dimensions], where n_dimensions can be from 1-10 (these are Principal Components so even n_dimensions = 1 captures a lot of the structure in the data).
Linear classifiers do relatively well in predicting an upcoming behavior as far back as 5-10 seconds prior to behavior and vanilla LSTMs also have increasingly better prediction closer to behavior. So I know there is structure in the data that can be learned, and was hoping that some form of temporal convolution would recover the structure better than these other methods.
I'm setting up the model like this:
And after compiling the model I get this:
The model gets over trained after a few hundred epochs (i.e. does ok on training data), but the val_loss is still very large and it performs at chance or worse on test data (I also attach an image of the average prediction at each time point):
Any advice on perhaps tweaking the TCN input params to improve on prediction?