Transfer Learning - Githubissues

ThomasNorr commented 2 years ago

Hello,

thanks for your fascinating work. I am trying to use the B-cos network (the densenet121 named “densenet_121_cossched”) in my research but I struggle with having it transfer effectively to smaller datasets, e.g. CUB2011. In fact, it overfits much more ( much worse final test acc) and improves a lot slower than the conventional densenet (In fact, only retraining the final layer leads to no learning whatsoever across a range of hyperparameters that all work for the conventional one). Since you have experience with training this network, I figure I might just ask you:

How sensitive was the training to optimized hyperparameters? Do I “just” need to tune regularization etc?
Have you done some experiments on transfer learning and maybe figured out effective methods?
Could B-cos networks be less suited for fine-grained-classification?
How exactly did you train the model? From my point of view your training code does not work, as the trainer class does not get a “loss” argument or am I missing something? I am encountering nan as loss, when using BCE.
Did you investigate the impact of norming w in Equation 3 in the paper (maybe with increased B) so that the model rescales the outputs himself?

Any answers would be greatly appreciated :)

Greetings

moboehle commented 2 years ago

Hi Gnabe, sorry for the delay in answering and thanks for your interest in the project!

Since there are no normalisation layers in the model, it is indeed somewhat more sensitive to a good choice of hyperparameters; in fact, we are currently investigating how to facilitate training for the B-cos networks. Regarding your questions:

Since the weight vectors are normalised before being applied to the input, an L2 normalisation on the model weights will probably not change much about the learning behaviour. If you need to regularise the models, I would recommend to use dropout or augmenting your data. (Potentially for B-cos models an L1 regularisation of the weights might work also).
So far, I have not used B-cos models for transfer learning, but I would not have expected any major problems. What issue are you facing exactly? If the loss is NaN or vanishingly small, you might need to adjust the layer scaling (scale_fact parameter in BcosConv2d) to ensure that the model output lies in a reasonable range.
Since the B-cos networks work well on ImageNet with a lot of fine-grained classification regarding dogs, I would say they seem to generally work for fine-grained classification too. However, fine-grained classifications such as on CUB might lead to less interpretable explanations, as the individual classes share many of the same features and the explanations for different classes might thus look similar.
Sorry for the confusion in the code. The trainer receives the loss argument via the exp_params (e.g., see https://github.com/moboehle/B-cos/blob/5f9218f6773534c80367793d1cd767742869764a/experiments/Imagenet/bcos/experiment_parameters.py#L55)
I assume you mean not norming w in equation 3? While this could potentially work, it gives the model more 'slack' w.r.t. how to solve the classification problem, which might hurt the interpretability. It might, of course, help the optimisation.

I hope this helps!

Best Moritz

ThomasNorr commented 1 year ago

Hi Moritz,

thanks for the answer.

That regularisation is hard is something I found out too. Dropout does not work (for me it did not and I think it changes the angle too much?) and I tried adding gaussian noise to slightly alter the angle, but that was also ineffective for me. Data augmentation has been very effective, but not sufficiently to prevent overfitting. I will try l1.
I think the problem mostly lies in the small dataset size of transfer learning problems, leading to overfitting
Good point regarding the localization, thanks.
I see, thanks.
Okay, thanks, I will try that too.

Thanks a lot :)

Best

moboehle / B-cos

Transfer Learning #1