Open lpyhdzx opened 5 months ago
Hi @lpyhdzx, sorry for the late reply!
Even though lambda_1 and lambda_2 are initialized to 0, and the lagrangian_loss is initially 0, the lambdas will still receive gradients during backpropagation. lambda_1 will get a gradient of (expected_sparsity - target_sparsity), and lambda_2 will get a gradient of (expected_sparsity - target_sparsity) ** 2. Therefore, these variables are still learnable.
Thanks for the reply!
I borrowed this method but found that this loss would be optimized to be negative value. I guess that this is because there is no additional constraint on this lagrange loss and the parameter lambda can reach to negative values.
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2,
I'm not sure if there is any way to avoid this
Thanks for the reply! I borrowed this method but found that this loss would be optimized to be negative value. I guess that this is because there is no additional constraint on this lagrange loss and the parameter lambda can reach to negative values.
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2,
I'm not sure if there is any way to avoid this
Hi! I am facing similar problem, lag_loss is negative and I am not sure if this will improve with additional training in my case the lambda_1, lambda_2 parameters are taking negative values and decreasing it would be great if you can share insights/advice on the matter
thank you!
Hi! Great work! I have a question about the default value of the lambda params. I've noticed that they are initialized to zero by default:
lambda_1_layer = torch.nn.Parameter(torch.tensor(0.0, device=self.device))
Given that the Lagrangian loss is calculated using these parameters as follows:lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2
Initializing lambda_1 and lambda_2 to zero seems to imply that the Lagrangian loss component will be zero, as there would be no penalty for deviating from the target sparsity.So, is it intended for the lambda parameters to be initialized to zero? or is there another section of the code where these parameters are set or adjusted after initialization? I appreciate any clarifications or insights you can provide on this matter.