nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
9.19k stars 712 forks source link

Resize embeddings so they are divisible by 64 #123

Open acforvs opened 1 year ago

acforvs commented 1 year ago

Hi, thanks for open sourcing the project!

Currently, the size of embeddings for StarCoder is 49152, but after one token is added it gets up to 49153 which makes it impossible to shard the model across any conventional number of GPUs (like 4 or 8).

I wonder whether it would be a correct option to add 7/15/63 random tokens like here https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/src/train_wizardcoder.py#L194 to be able to shard the model.

Do you have any suggestions about whether this seems reasonable? Thanks!

ChiYeungLaw commented 1 year ago

I think this is reasonable.