Take an LLM, reduce the hidden_size
for its matrices, and then overfit it to some text.
This is done to get a lightweight version of the same architecture, for testing.
Reduced models can be found in this HF ggml-org repo. Currently supported LLMs:
Architecture | HF repo | hidden size | base (MB) | lora (MB) |
---|---|---|---|---|
Phi3ForCausalLM |
microsoft/Phi-3-mini-4k-instruct |
64 | 20 | 12 |
LlamaForCausalLM |
meta-llama/Meta-Llama-3-8B-Instruct |
64 | 68 | 52 |
Gemma2ForCausalLM |
google/gemma-2-2b |
64 | 77 | 5 |
make HF_REPO=<your hf model repo>
make run
sets up the repo and then, for each <model-name>
:
<model-name>
from HF.base
model).base
to a different paragraph of text.<your hf model repo>
.Via a user write access token to be set as the environment variable HF_TOKEN
.
Environment (poetry
required):
make setup
To run the full script for a specific model run:
python reduce_llms_for_testing/main.py -m "<model-name>" -hf "<your hf model repo>"