Question on parameter glayerwise

jaehong31 / CGES

Combined Group and Exclusive Sparsity for Deep Neural Networks, ICML 2017

32 stars 9 forks source link

Question on parameter glayerwise #4

Closed lizhenstat closed 4 years ago

lizhenstat commented 5 years ago

Hi, thanks for your work I have one question on utils.py, what is the following parameter mean? glayerwise = [1.,1.0, 1./15, 1./144] elayerwise = [1.,0.5, 15., 144.]

Thanks in advance

lizhenstat commented 5 years ago

Hi, can you post the value of mu_l in cifar10 dataset? (what is the m there) Thanks a lot

lizhenstat commented 5 years ago

@jaehong-yoon93

jaehong31 commented 5 years ago

@lizhenstat Hi, Sorry for late check. This is just because the conv params, and fc params show different distribution. You can easily think that we use different hyperparameter to conv and fc for proper pruning.

jaehong31 commented 5 years ago

@lizhenstat Thus, If you apply the model on other datasets, it depends on the condition of other parameters and sparsification level.

lizhenstat commented 5 years ago

@jaehong-yoon93 thanks for your replying. However, I still cannot figure how you apply this equation get the above numbers(1, 15,144). Can you explain it in more detail? Is this pruning parameter different from the following equation in Section3.2 $\mu _{l}=m+ (1-2m)\frac{l}{L-1}$ Thanks in advance

jackliu333 commented 5 years ago

@jaehong-yoon93 thanks for your replying. However, I still cannot figure how you apply this equation get the above numbers(1, 15,144). Can you explain it in more detail? Is this pruning parameter different from the following equation in Section3.2 Thanks in advance

Same here, not sure how it's applied in the code. Could you shed some light? @jaehong-yoon93

jaehong31 commented 5 years ago

Well, in the code, we can say that this is a little bit of modified version (but still minor). We can easily understand that this just changes the scale of utilization btw two sparsifications drastically at higher fc layer than original version; much higher exclusive sparsification, and much lower group sparsification at higher fc layer(it is consistent to our intuition). Also, the exact values on that are not from sophisticated tuning, thus you can use any tuned parameters, do not worry about it. I think '0.5' in elayerwise has no specific meaning, that's okay to change '1' or something, actually I have no idea where it comes from.