princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
533 stars 39 forks source link

duplicate mean values during mask initialization #45

Closed czhang99 closed 8 months ago

czhang99 commented 8 months ago

Observed duplicated initializations when initializing the mean value for masking variables. Is one of them preferred over the other?

https://github.com/princeton-nlp/LLM-Shearing/blob/3560a877e2833c3da393923be0bd6753b6ef1c6d/llmshearing/models/l0_module.py#L45-L46C17

xiamengzhou commented 8 months ago

Hii! Check out here issue #3 for an details answer!

czhang99 commented 8 months ago

thanks for the reference issue ticket. well-explained in https://github.com/princeton-nlp/LLM-Shearing/issues/3. closing the issue.