openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.36k stars 374 forks source link

custom 50M model #82

Closed SinanAkkoyun closed 1 year ago

SinanAkkoyun commented 1 year ago

Hey! :) I wanted to ask, what goes into making your own 50M (or x param) model from the LlaMa architecture? Completely disregarding pretraining, just random weights, what do I need to look out for to make inference work for such models? I am specifically interested in GPTQ quantizing it and running it with for example Exllama

So, what hyperparams should I choose and what do I need to look out for? Thank you for your time.

young-geng commented 1 year ago

I think you just need to change the hidden size, number of attention heads and number of layers to obtain a smaller model. Inference should just work with the transformers llama implementation. You can take a look at our 3b configuration here.

SinanAkkoyun commented 1 year ago

@young-geng Thank you very much! But what hyperparameters are preferred? Is there a rule of thumb for attention heads, hidden size etc?

young-geng commented 1 year ago

There's usually no agreed upon configurations for such small model, so you have a lot of freedom defining it yourself. Maybe you can get some inspirations from Table A9 of the Chinchilla paper

SinanAkkoyun commented 1 year ago

Thank you very much! :)