Open secretu opened 10 months ago
Hi, I noticed the initial of the 'intermediate_z' is different from others, which will introduce a initial sparsity in mlp layer. I wonder why did this different initial step.
https://github.com/princeton-nlp/CoFiPruning/blob/da855a809c4a15e1c964a47a37998db2e1a226fd/models/l0_module.py#L147C8-L147C39
https://github.com/princeton-nlp/CoFiPruning/blob/da855a809c4a15e1c964a47a37998db2e1a226fd/models/l0_module.py#L134C9-L134C9
Hi, I believe that it's mostly a typo. I also vaguely remember having an initial sparsity does not affect performance much!
Hi, I noticed the initial of the 'intermediate_z' is different from others, which will introduce a initial sparsity in mlp layer. I wonder why did this different initial step.
https://github.com/princeton-nlp/CoFiPruning/blob/da855a809c4a15e1c964a47a37998db2e1a226fd/models/l0_module.py#L147C8-L147C39
https://github.com/princeton-nlp/CoFiPruning/blob/da855a809c4a15e1c964a47a37998db2e1a226fd/models/l0_module.py#L134C9-L134C9