Difference between paper and code

megvii-research / BBN

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

https://arxiv.org/abs/1912.02413

MIT License

657 stars 100 forks source link

Difference between paper and code #2

Closed valencebond closed 4 years ago

valencebond commented 4 years ago

the corresponding code is

mixed_feature = 2 torch.cat((l feature_a, (1 - l) * feature_b), dim=1) output = model(mixed_feature, classifier_flag=True)

according to the code, the target introduced in section 4.3 may can not achieve, as the feature is concatenated followed by only one classifer.

Cumulative learning strategy is proposed to shift the learning focus between the bilateral branches by controlling the weights for features produced by two branches and the classiﬁcation loss L.

would you mind telling me the reason behind this change?

ZhouBoyan commented 4 years ago

Actually, two fully connected layer could be merged into one for simplicity. Please refer to the formula below: cal

valencebond commented 4 years ago

thanks~ that means manifold mixup by concat is a better way than by originally sum ？what‘s more，why the scale 2 is needed？

mixed_feature = 2 torch.cat((l feature_a, (1 - l) * feature_b), dim=1)

ZhouBoyan commented 4 years ago

Scale 2 is to ensure the grad consistent with the default combiner. (e.g. two samplers sample the same picture, and l = 0.5)

nisargshah1999 commented 3 years ago

thanks~ that means manifold mixup by concat is a better way than by originally sum ？what‘s more，why the scale 2 is needed？

mixed_feature = 2 torch.cat((l feature_a, (1 - l) * feature_b), dim=1)

Hi @valencebond @ZhouBoyan Is this equivalent to that mentioned in the paper or concat method is better performing than original sum.

Thanks