Closed Hao-Liu closed 4 years ago
Hi Hao Liu, thanks for your question and apologies for the long delay in getting back to you. I believe you are referring to this line? https://github.com/rdroste/unisal/blob/17ab7ddb40cae5196423aa31ba7e4eb2c4267581/unisal/models/MobileNetV2.py#L171 This is the same as having a strided convolution, i.e., discarding every second element. We apply the stride manually after the convolution because we copy the features before down-sampling for the skip connection that is input into the decoder. Does that answer your question?
@Hao-Liu did my reply answer your question? Let me know if I can provide any further clarifications.
Closing this issue due to inactivity
Great work! But I'm curious about the choice of the weird pooling method of mobilenet backbone. You didn't use a normal pooling method like average/max pooling or pooling by stride in convolution, but choose to directly slice a quarter of the input. I thought it'll stop the gradient for the other 75% of inputs when backprop and make these inputs useless, which doesn't make sense at all. Is this an intentional design or a random choice?