Jerboa: finetune LLM
Jerboa is an experimental repo to finetune several open source LLM (llama, falcon, ...) on several datasets ( alpaca, code-alpaca, ...).
The repo is shared publicly to allow the community to reproduce our results. Though it is still very experimental and a lot of breaking change will happen. This is not production ready software. Check out finetuner for a production ready software.
credits: this project is originally a fork of the great alpaca lora repo.
first install poetry
pip install -U poetry
If you encounter a keyring error on GPU run:
poetry run python -m pip install keyring
poetry run python -m keyring --disable
You don't need to setup a virtual env, poetry will take care of it.
poetry install
this is needed to fixe the OOM problem
For GPU install torch finish the torch installation (cuda stuff):
pip install torch
then activate the virtual env
poetry shell
install pre commit hook
pre-commit install
to follow the rest of the readme, you need to be in the jerboa
folder.
cd jerboa
This folder is the fork of the great alpaca lora repo.
the rest of this readme is the original README from the repo.
!!! To follow the rest be sure to have enabled your virtual env with poetry (see above)
To run this repository on runpod, use the latest PyTorch container on runpod. Connect to the VM via SSH, then run the following command to install the necessary dependencies and login to github. You can now continue with the training and inference explained below.
bash <(curl -Ls https://raw.githubusercontent.com/jina-ai/jerboa/main/script/config.sh)
To run a training run and automatically shutdown the runpod afterwards run the following command in a screen on runpod: ATTENTION: The runpod shuts down immediately if you run the command before logging in to WandB
./training_run.sh "python <your training run setup>"
We can run the code in debug mode, this allows to test the code with little resources (small model and small dataset).
CUDA_VISIBLE_DEVICES=0 python finetune.py --debug
this still use wandb. If you want to disable wandb you can do
CUDA_VISIBLE_DEVICES=0 python finetune.py --debug --no-use-wandb
It is possible to train the model on multiple GPUs. This allows to train the model faster. Training on 2x3090 GPUs:
WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=1,2 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base-model 'yahma/llama-7b-hf' --output-dir './lora-alpaca' --batch-size 128 --micro-batch-size 4 --eval-limit 30 --eval-file eval.jsonl --wandb-log-model --wandb-project jerboa --wandb-run-name jerboa-intial-train --wandb-watch gradients --num-epochs 3
Training on 3x3090 GPUs:
WORLD_SIZE=3 CUDA_VISIBLE_DEVICES=0,1,2 torchrun --nproc_per_node=3 --master_port=1234 finetune.py --base-model 'yahma/llama-7b-hf' --output-dir './lora-alpaca' --batch-size 128 --micro-batch-size 4 --eval-limit 30 --eval-file eval.jsonl --wandb-log-model --wandb-project jerboa --wandb-run-name jerboa-intial-train --wandb-watch gradients --num-epochs 3
Currently, the training pipeline supports 2 training datasets:
yahma/alpaca-cleaned
: cleaned version of the alpaca dataset, available on the HF datasets hub. This is the used dataset by defaultsahil2801/CodeAlpaca-20k
: a dataset of 20k code snippets, available on the HF datasets hub. To use this dataset, specify the following parameter in the training command: --data_path "sahil2801/CodeAlpaca-20k"
togethercomputer/RedPajama-Data-Instruct
: this dataset is provided by togethercomputer
and contains 2 subsets:
--data-path togethercomputer/RedPajama-Data-Instruct --data-files data/NI_decontaminated.jsonl.zst
--data-path togethercomputer/RedPajama-Data-Instruct --data-files data/P3_decontaminated.jsonl.zst
databricks/databricks-dolly-15k
: a dataset of 15 instructions, available on the HF datasets hub. To use this dataset, specify the following parameter in the training command: --data-path "databricks/databricks-dolly-15k"
You can also come up with a different dataset if it follows the alpaca dataset format. If it follows a different format similar to one of the previously supported formats, you can specify one of the existing dataset preprocessors to transform it to alpaca format during training.
Just add the following flags:
--data-path curated_dataset_name --data-files curated_dataset_data_files --dataset-preprocessor redpajamas_ni_to_alpaca_format
You can run our tests by doing at the root folder level
CUDA_VISIBLE_DEVICES=0 pytest tests
this should take a couple of second to run on a singe 3090. Just doing one epoch over 100 data points
You need to specify the target lora_target_modules
as for each different model that is used. For Falcon 7b lora_target_modules=["query_key_value"]
For Llama 7b lora_target_modules=["q_proj", "v_proj"]
. However, in the command line the target modules need to be passed as individual arguments.
See the example below for an illustration.
To run evaluation you first need an evaluation file or dataset. This evaluation looks like the following:
{"id": "user_oriented_task_0", "motivation_app": "Grammarly", "instruction": "The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.", "instances": [{"input": "If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.", "output": "If you have any questions about my rate or find it necessary to increase or decrease this project's scope, please let me know."}]}
You can download self-instruct evaluation data using this command:
wget https://raw.githubusercontent.com/yizhongw/self-instruct/main/human_eval/user_oriented_instructions.jsonl
To run evaluation after finetuning you can use the following command:
CUDA_VISIBLE_DEVICES=2 \
python finetune.py \
--base-model 'yahma/llama-7b-hf' \
--lora-target-modules q_proj \
--lora-target-modules v_proj \
--data-path <Your-data-path> \
--output-dir './lora-alpaca' \
--wandb-project 'jerboa' \
--wandb-run-name 'test-run' \
--wandb-watch 'gradients' \
--wandb-log-model \
--num-epochs '2' \
--eval-file 'user_oriented_instructions.jsonl' \
--eval-limit '5'
--eval-file: path to the evaluation file
--eval-limit: number of examples to evaluate on
Evaluation results will be automatically logged to wandb.
You can also serve a gradio app to showcase your model. This can be either a pre-trained model or a fine-tuned model. To serve a gradio app with your specific lora weights just run from the gradio repo.
poetry run python app.py --base-model <wandb path to base_model> --lora-repo <path to wandb or hf adapter weigths>
for example:
poetry run python app.py --base-model tiiuae/falcon-7b --lora-repo wandb:jina-ai/jerboa/lora_weight:v19
See load_models.py
in utils
to learn how to correctly specify the path to the lora weights.