philipperemy / keras-tcn

Keras Temporal Convolutional Network.
MIT License
1.87k stars 454 forks source link

Clarification about input_dims #166

Closed puneat closed 3 years ago

puneat commented 3 years ago

Hey,

I've been trying to use TCNs for images and I needed clarification about the input shape. My image is a temporal image of size (259 x 50) representing (time_steps x height). In a visual representation, it is somewhat like a normal spectrogram image. Now I understand that TCN is essentially a 1-D FCN plus causal convolutions which takes input of size (batch_size, timesteps, input_dims).

My doubt is, when I provide a batch of 20 images of size (259 x 50) as input to the TCN that takes in input as (batch_size, timesteps, input_dims), does it mean that my image is broken into 50 different 1D arrays or is something else happening?

Quick Note: I was able to train a image classification model using the above format of images with TCN having an accuracy of 95%.

philipperemy commented 3 years ago

To answer your question, the input dim of the Keras TCN is the input dim of the Conv1D layers. If I can draw a parallel with a simple ConvNet, it would be your color channel (dim = 3). Each dim here is independent.

The input is (batch_size, time_steps, input_dim) and the time dimension is unrolled.

In your case height is the input_dim and 259 is your time_steps (which is big enough so you should benefit from the advantage of the TCN. If not, might have to tune it a bit or maybe because the information further from a distance of 50 steps is not relevant for the predictions!).

As a summary: give a tensor of shape (20, 259, 50) to the TCN and it should work! I think that's what you're already doing.

does it mean that my image is broken into 50 different 1D arrays or is something else happening? Somehow yes. you can refer to this doc for more info: https://www.tensorflow.org/api_docs/python/tf/nn/conv1d. 50 is your in_channels variable in this doc. You just increase your filters in the second dimension, which is like considering them independently (if I'm right).

puneat commented 3 years ago

That clears it up! Basically, the Input tensor(batch_size, time_steps, input_dims) is convoluted using 1D convolutions for each of the input_dims (50) and then the output is reshaped to (batch_size, time_steps, nb_filters).

Thanks a lot!