locuslab / TCN

Sequence modeling benchmarks and temporal convolutional networks
https://github.com/locuslab/TCN
MIT License
4.12k stars 874 forks source link

Is TCN suitable for spatio-temporal data? #73

Open taheramii opened 1 year ago

taheramii commented 1 year ago

I have dimensional spatio-temporal data which the spatial part is represented by 2D matrices(like an RGB image). How can I feed the data to the TCN?

alexmehta commented 1 year ago

I have a similar question. I want to use a TCN for video data. Anyone have any ideas?

alexmehta commented 1 year ago

I have found a solution using an encoder.

taheramii commented 1 year ago

I have found a solution using an encoder.

I was wondering if you could share the solution?

Thanks, Taher

zeroocean commented 1 year ago

Did you find a solution?

alexmehta commented 1 year ago

Just use any encoder and set channels to the output dim for one time step of the encoder. For example if you have some CNN model that inputs image (n_imgs,112,112) and outputs (n_imgs, channels), you simply input that into a CNN making sure that n_channels = channels and n_imgs is the length not the channels (possibly requiring reshaping).

Lmk if that makes sense.

chc-tw commented 1 year ago

Just use any encoder and set channels to the output dim for one time step of the encoder. For example if you have some CNN model that inputs image (n_imgs,112,112) and outputs (n_imgs, channels), you simply input that into a CNN making sure that n_channels = channels and n_imgs is the length not the channels (possibly requiring reshaping).

Lmk if that makes sense.

You are correct in saying that we can use any CNN backbone initially to transform the input images (n_imgs, W, H, C) into (n_imgs, W', H', C'), where W', H', and C' are derived from the last feature map. To reduce the dimensions of W and H, we can employ either flattening or global average pooling (which is recommended) so that the dimension becomes (n_imgs, C'). Afterward, we can feed the transformed data into TCN.

Please let me know if you need any further clarification.

Wei4lei commented 1 year ago

How does it perform?

只需使用任何编码器,并将通道设置为编码器的一个时间步长的输出调光。例如,如果您有一些 CNN 模型输入图像 (n_imgs,112,112) 和输出(n_imgs,通道),您只需将其输入到 CNN 中,确保 n_channels = 通道和 n_imgs 是长度而不是通道(可能需要重塑)。 如果这有意义的话,LMK。

您说得对,我们最初可以使用任何CNN主干将输入图像(n_imgs,W,H,C)转换为(n_imgs,W',H',C'),其中W',H'和C'来自最后一个特征图。为了减少 W 和 H 的维度,我们可以采用扁平化或全局平均池化(推荐),使维度变为 (n_imgs, C')。之后,我们可以将转换后的数据输入 TCN。

如果您需要任何进一步的澄清,请告诉我。

How does it perform?