duplicate mean values during mask initialization

princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

https://arxiv.org/abs/2310.06694

MIT License

553 stars 44 forks source link

Closed czhang99 closed 9 months ago

czhang99 commented 9 months ago

Observed duplicated initializations when initializing the mean value for masking variables. Is one of them preferred over the other?

xiamengzhou commented 9 months ago

Hii! Check out here issue #3 for an details answer!

czhang99 commented 9 months ago

thanks for the reference issue ticket. well-explained in https://github.com/princeton-nlp/LLM-Shearing/issues/3. closing the issue.