Open b03505052 opened 4 years ago
Hi, we have tried L1 on small experiment but we didn't observe improvement.
L2 was based on the Bayesian view and seemed more plausible but L1 also can have nice features so please go ahead and I will be happy to discuss further. I didn't get what you mean here "even use L1 difference to derive the gij"
Hi Rahaf,
Sorry for unclear expression, I mean that have u ever taken L1 norm as the objective function to derive the importance weight, and I just found it in MAS_based_Training.py file, so I have no question in this issue now, thanks! BTW, did u compare any transfer learning methods? I think that regularization-based methods is similar with transfer task. And I have another question, is there any reason for the selection of the optimizer(SGD) and the scheduler?
Thanks!
Hi Rahaf,
From my understanding in your papers, the penalty term(theta-theta*) is not limited to mse, right? have u tried L1 loss? or even use L1 difference to derive the gij?