Open celidos opened 3 years ago
Thanks for the report @celidos , the documentation is wrong, is should indicate that the default is True.
@fmassa, do you remember why the transform_input
and aux_logits
parameters are passed as kwargs? It looks like we could just add them as regular parameters (we can make them keyword-only if we want to) ?
I feel llike we should avoid kwargs unless we really need to, as they obfuscate the documentation, like here. Also, in the googlenet code we're modifying the kwargs dictionary inplace, and as a user I would find this fairly unexpected.
Hi,
The situation is a bit complicated, and I think we should improve the documentation indeed.
The problem is that the pre-trained weights from Inception and GoogleNet were converted from TF, which have a different input normalization.
In order to make the models compatible with the rest of torchvision, we added this transform_input
argument.
This argument can be seen as an internal implementation detail, which gets enabled if you load the default pre-trained weights that we provide (which were converted from the original implementation in TF).
So if you are training your model from scratch and you are using the default imagenet mean / std values, then you don't need to change anything.
fix
https://github.com/pytorch/vision/pull/4137
can possibly create mirror problem: if pretrained=False
, you would expect for model to transform input by default, but GoogleNet class has default parameter transform_input=False
, therefore it will not perform transformation.
How can this be brought to a common style and not cause misunderstandings?
@celidos if pretrained=False
, the model shouldn't transform the input by default, because we are assumiing that all models have the same input normalization.
only if pretrained=True
that we should transform the input, as the weights have been ported from TF.
A probably better thing to do would have been to have embedded the scaling factors in the weights / bias of the first convolutional layer in the pre-trained weights, this why we wouldn't have to add this transform_input
at all.
I agree with Francisco that the documentation needs to be improved and that the transform_input
should probably become true only when pretrained=True
.
A probably better thing to do would have been to have embedded the scaling factors in the weights / bias of the first convolutional layer in the pre-trained weights, this why we wouldn't have to add this transform_input at all.
I would advise against embedding the scaling factor on the weights of the first convolution because this can create tricky situations on transfer learning. The problematic scenario is when someone decides to train end-to-end from pre-trained weights. Since the single scaling parameter will be absorbed by the weights of the convolution, there will be no mechanism during the updates that ensures that all weights will be updated proportionally. Hence due to random effects caused by the minibatch, some weights in the convolution can be updated dis-proportionally causing issues on training. The issue can be mitigated with small LRs at the beginning and hopefully since one does e2e, eventually all weights should adjust but still it creates a situation where the user must be careful else they might mess up the training.
Hello!
I recently noticed that I might be doing image normalization twice in my experiments. The documentation says that the default value of the
transform_input
parameter is False.So when calling
I would probably expect the model not to do any input transformations, but accidentally it does (permalink) until you directly specify
transform_input=False
. So in case ofpretrained=True
and not-specifiedtransform_input
model suddenly sets its value to True:It is confusing for me. This thing is only happens in GoogleNet.