Closed benhgm closed 2 years ago
Hi Benjamin,
The encoder is efficient-net and was trained for image classification. As such the whole model contains convolutions to downsample up to a factor of 32 (I think), as well as a flattening layer and a fully-connected layer to predict the 1,000 class probabilities.
The parameter self.downsample
in the encoder allows you to control how much you want to downsample your feature spatial size. Then we use the delete_ununsed_layers
to get rid of the efficient-net layers that were not used in the process of feature extraction.
I hope that answers your question :)
Hi, I couldn't understand how the self.downsample parameter was set (why 8 and 16 and how it affects upsampling_in_channels) and why delete_unused_layers is required in the encoder model. I tried to search the efficientnet-pytorch implementation and couldn't find any reference for this operation. Could you explain briefly why this is required? Thank you!