rahafaljundi / MAS-Memory-Aware-Synapses

Memory Aware Synapses method implementation code
93 stars 20 forks source link

the omega penalty term in L1 #4

Open b03505052 opened 4 years ago

b03505052 commented 4 years ago

Hi Rahaf,

From my understanding in your papers, the penalty term(theta-theta*) is not limited to mse, right? have u tried L1 loss? or even use L1 difference to derive the gij?

rahafaljundi commented 4 years ago

Hi, we have tried L1 on small experiment but we didn't observe improvement.

L2 was based on the Bayesian view and seemed more plausible but L1 also can have nice features so please go ahead and I will be happy to discuss further. I didn't get what you mean here "even use L1 difference to derive the gij"

b03505052 commented 4 years ago

Hi Rahaf,

Sorry for unclear expression, I mean that have u ever taken L1 norm as the objective function to derive the importance weight, and I just found it in MAS_based_Training.py file, so I have no question in this issue now, thanks! BTW, did u compare any transfer learning methods? I think that regularization-based methods is similar with transfer task. And I have another question, is there any reason for the selection of the optimizer(SGD) and the scheduler?

Thanks!