vikrant7 / mobile-vod-bottleneck-lstm

Implementation of Mobile Video Object Detection with Temporally-Aware Feature Maps using PyTorch
131 stars 37 forks source link

Why init BottleneckLSTM with height=10 and width=10 #4

Closed turboxin closed 5 years ago

turboxin commented 5 years ago

hi @vikrant7 , thanks for sharing your implementation! It's more of a question than a issue, in mvod_bottleneck_lstm1.py line 303, you init BottleneckLSTM with height=10 and width=10:

self.bottleneck_lstm1 = BottleneckLSTM(input_channels=1024alpha, hidden_channels=256alpha, height=10, width=10, batch_size=batch_size)

Why is that not the height and width of the input image? Correct me if I'm wrong since I don't know conv-lstm well.

Thanks in advance!

vikrant7 commented 5 years ago

Hi @turboxin, thanks for asking this query. Here, we are placing Bottleneck LSTM layer after conv 13 layer i.e. after mobilenet v1 feature extractor.
So, upto this layer all the convolution operations reduces the input tensor size based on the operations defined. Hence, at this level input size is of 1010. Subsequently, input for bottleneck lstm layer 2 will be 55 and so on.

turboxin commented 5 years ago

Oh right... it's a silly question I have to say, I thought 10 is the sequence length. Thanks for clarifying that for me!