princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
533 stars 39 forks source link

The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper #42

Open YWMditto opened 8 months ago

YWMditto commented 8 months ago

For example, truncating the loss difference to 0 does not seem to be implemented. image

https://github.com/princeton-nlp/LLM-Shearing/blob/1386c8f69cfb3bf64896959cf3754d2bf87659c7/llmshearing/callbacks/dynamic_loading_callback.py#L34

And, what is the purpose of this line? https://github.com/princeton-nlp/LLM-Shearing/blob/1386c8f69cfb3bf64896959cf3754d2bf87659c7/llmshearing/callbacks/dynamic_loading_callback.py#L41

xiamengzhou commented 8 months ago

Hi, sorry for the late reply!

In the implementation, we added a small uniform proportion (c=1e-4/7) to each domain when updating the weights. It's simply a smoothing factor, and does not in practice affect results much.