sail-sg / scaling-with-vocab

[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
71 stars 4 forks source link

Experiments with Larger Vocabularies for Llama 2 Models? #2

Open wdlctc opened 1 month ago

wdlctc commented 1 month ago

Thank you for this interesting study on vocabulary scaling laws.

I'm curious if you ran any experiments comparing the performance of Llama 2 models with larger vocabularies as predicted by your approaches - specifically Llama 2 7B with a 57K vocabulary, Llama 2 13B with a 79K vocabulary, and Llama 2 70B with a 216K vocabulary.

If so, how did the results compare to the original Llama 2 models with 32K vocabularies? If not, do you have plans to conduct such experiments in future work? Is it bottlenecked by GPU memory wall?

It is not shown on paper but I think if it is memory problem I can help on this issue.

It would be valuable to see empirical validation of your predictions on these widely-used model scales. Thank you!

SivilTaram commented 1 month ago

Hello @wdlctc, thank you for your interest in our work! We appreciate your inquiry regarding experiments on 7B-level models. Due to budget constraints, we haven't been able to conduct these specific experiments yet. However, we will provide more insights on 7B-level models in the camera-ready version of our paper. We'd be very grateful if any sponsorship opportunities arise to support these experiments. Thanks!