moboehle / B-cos

B-cos Networks: Alignment is All we Need for Interpretability
Other
104 stars 10 forks source link

Transfer Learning #1

Closed ThomasNorr closed 1 year ago

ThomasNorr commented 2 years ago

Hello,

thanks for your fascinating work. I am trying to use the B-cos network (the densenet121 named “densenet_121_cossched”) in my research but I struggle with having it transfer effectively to smaller datasets, e.g. CUB2011. In fact, it overfits much more ( much worse final test acc) and improves a lot slower than the conventional densenet (In fact, only retraining the final layer leads to no learning whatsoever across a range of hyperparameters that all work for the conventional one). Since you have experience with training this network, I figure I might just ask you:

Any answers would be greatly appreciated :)

Greetings

moboehle commented 2 years ago

Hi Gnabe, sorry for the delay in answering and thanks for your interest in the project!

Since there are no normalisation layers in the model, it is indeed somewhat more sensitive to a good choice of hyperparameters; in fact, we are currently investigating how to facilitate training for the B-cos networks. Regarding your questions:

I hope this helps!

Best Moritz

ThomasNorr commented 1 year ago

Hi Moritz,

thanks for the answer.

Thanks a lot :)

Best