noureldien / timeception

Timeception for Complex Action Recognition, CVPR 2019 (Oral Presentation)
https://noureldien.com/research/timeception/
GNU General Public License v3.0
157 stars 33 forks source link

Placement of TC module #5

Closed Raazzta closed 5 years ago

Raazzta commented 5 years ago

Hi @noureldien , what happen if TC module is placed right after Input layer (before basemodel)? Will it be learn motion representation of a video? Is it plausible? Need your thoughts/experiences on this. Thanks.

noureldien commented 5 years ago

Hi Raazzta, thanks for asking. You will need to modify it a small bit for TC to work. The convolutions in TC convolves time and channel correlation, but it does not convolve space. Without space convolution, it might not work well. Thus, you will need to modify the kernel size of the depthwise temoral convolution from (T, 1,1) to (T, 3, 3), where T = {3, 5, 7}. In TC, we use multi- scale convolutions.

This work might help you, because they also use these depth-wise convolutions. Have a look at figure 2 here https://arxiv.org/abs/1904.02811

Raazzta commented 5 years ago

hI @noureldien , thanks for your answer. Actually, I have seperate network to learn spatial dimension. Thus, I want another network to learn temporal-only dimension (only convolve on channel axis -> (T,1,1), just like your TC block does. But, I have some "weird" ideas to put TC block after Input layer and in the middle of base model. That's why I need your opinion on this matter, whether it is make sense or not. Thanks.

noureldien commented 5 years ago

In the middle of the network, that's I get it. But at the bottom of the layer, right after the input, I don't get it. Maybe it's too early for the temporal convolution to learn something meaningful? What is the problem you're working on? Please send me your e-mail, so we can have a chat, that's better. My e-mail is nhussein AT uva.nl

Raazzta commented 5 years ago

Will contact you soon, thanks!