Closed minygd closed 4 years ago
Hi @minygd,
I believe this issue is possibly due to that the BatchNorm parameters in the shared network are oscillating for different tasks. And the sampling for the test dataset is not in a fixed order.
But from your table above, except for the RMSE, the rest metrics look quite stable to me.
For the sanity check, could you
shuffle=True
in the test dataset, and see whether it will produce a more stable result?Here is my conditional BN code:
class ConditionalBatchNorm2d(nn.Module):
def __init__(self, num_features, num_classes):
super().__init__()
self.num_features = num_features
self.bn = nn.BatchNorm2d(num_features, affine=False)
self.embed = nn.Embedding(num_classes, num_features * 2)
self.embed.weight.data[:, :num_features].normal_(1, 0.02) # Initialise scale at N(1, 0.02)
self.embed.weight.data[:, num_features:].zero_() # Initialise bias at 0
def forward(self, x, y):
out = self.bn(x)
gamma, beta = self.embed(y).chunk(2, 1)
out = gamma.view(-1, self.num_features, 1, 1) * out + beta.view(-1, self.num_features, 1, 1)
return out
Let me know the result.
Thanks.
Hi @lorenmt,
I think the issue maybe caused by the different Evaluation Perspective. The metric like RMSE
should be aggregated one by one(Samples). But it is confusing that Abs_Rel
also make some difference. I changed the TestDataLoader with shuffle=False
, and the results are mentioned above.
Thanks for your advise, I will change the model with Conditional BatchNorm and report the results as soon as possible. And by the way, would you mind tell me the reason why the Conditional BatchNorm matters?
Thanks again for your kind and quick reply.
Hi,
I am not sure I understand your comment on "one by one aggregation" for RMSE metric, I thought we just compute the RMSE for each batch, and then report the average error across one batch? Also, could you confirm whether this issue is from this model specifically, or it's a general problem in all PyTorch-based evaluations? Could you observe a similar phenomenon from a standard image classification model, like VGG-16 on CIFAR100 as an example?
I am sorry that I just realized that it's not suitable using Conditional BN as I mentioned in the last comment. For dense prediction problem, we are having multiple labels for each specific input, so we don't have anything to condition with.
I misunderstood it as from the Visual Decathlon dataset, where we trying to solve multiple datasets in a single network, in this case, conditional BN is essential to condition on the correct dataset to compute corresponding normalization parameters.
Hi,
Yes, you are right. Thank you for introducing conditional BN to me. As I mentioned above, the RMSE
is calculated as $RMSE = \sqrt{\frac{1}{HW} \sum_{N=HW}^{i}( (\hat{y_{i}} - y_{i} )^2 ))}$
when use Batch_size = 1
that is correct. But when we Change the Batch_size
, the Equation will change to $RMSE = \sqrt{\frac{1}{BHW} \sum_{N=BHW}^{i}( (\hat{y_{i}} - y_{i} )^2 ))}$
. It looks fine, but when we try to calculate the mean value of all Groups(len(TestData) / Batch_size), the error will occur.
For example, when we set Batch_size = 2
, the former one is calculated as $RMSE = \frac{1}{2*Group}(\sqrt{\frac{1}{HW} \sum_{N=HW}^{i}( (\hat{y_{i}} - y_{i} )^2 ))}$
and the latter one is $RMSE = \frac{1}{Group}\sqrt{\frac{1}{2HW} \sum_{N=BHW}^{i}( (\hat{y_{i}} - y_{i} )^2 ))}$
. And we can see the difference in Equations that the former one is $\sqrt{\frac{1}{2}}$
times than the latter one.
Please let me know if My derivations whether explained. Thanks!
Yes, thanks for the explanation. I think I understand the issue now. Since it's a mathematical problem, I believe an easy fix should be just to compute average across H*W inside the root (pixel-wise average), and then average across batch size outside the root, and then it should be consistent across any number of batch size.
The rest of the metric, for example, Abs Rel having small variations is fine to me. If you really want to dig deeper, you can confirm whether the prediction from the same data in different batch size is the same. Otherwise, it should be another numerical problem.
Best, Sk.
Thanks, my advice for calculating the metric like RMSE
is just as you mentioned. It's a nice disccusion for me.
Best Regards,
H-X.
Hello, sorry to bother. But I found that using different
batch_size
settings may cause different Evaluation Results. For Example, the follows are Metrics in SGNet-MTAN, and thebatch_size
is used as test_batch_size.(RMSE
is modified according to the equation)