microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k stars 201 forks source link

[Minor issue] Discrepancy inside arxiv paper #82

Open radarFudan opened 8 months ago

radarFudan commented 8 months ago

Hi, first of all thank you for the nice work.

I was reading the paper and found the weight decay mentioned in the appendix is different from the one mentioned in the main body.

https://arxiv.org/pdf/2307.08621.pdf

As the weight decay between 0.01 and 0.05 is a quite huge gap, maybe need to double check and make them consistent? Or are they configs for different models?