steps to reproduce
1) start a runpod container with the pytorch 2.01 template and lots of disk space
2) run your sample command on a properly formatted dataset:
python -m llamatune.train \
--model_name meta-llama/Llama-2-13b-chat-hf \
--data_path master_qa.json \
--training_recipe lora \
--batch_size 8 \
--gradient_accumulation_steps 4 \
--learning_rate 1e-4 \
--output_dir chat_llama2_13b \
--use_auth_token xxxzzz
3) result is:
Model ready for training!
trainable params: 250347520 || all params: 6922337280 || trainable: 3.616517223500557
WARNING:root:Loading data...
WARNING:root:Tokenizing inputs... This may take some time...
config TrainingConfig(model_name='meta-llama/Llama-2-13b-chat-hf', data_path='master_qa.json', output_dir='chat_llama2_13b', training_recipe='lora', optim='paged_adamw_8bit', batch_size=8, gradient_accumulation_steps=4, n_epochs=3, weight_decay=0.0, learning_rate=0.0001, max_grad_norm=0.3, gradient_checkpointing=True, do_train=True, lr_scheduler_type='cosine', warmup_ratio=0.03, logging_steps=1, group_by_length=True, save_strategy='epoch', save_total_limit=3, fp16=True, tokenizer_type='llama', trust_remote_code=False, compute_dtype=torch.float16, max_tokens=4096, do_eval=True, evaluation_strategy='epoch', use_auth_token='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', use_fast=False, bits=4, double_quant=True, quant_type='nf4', lora_r=64, lora_alpha=16, lora_dropout=0.0)
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/llamatune/train.py", line 50, in
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/llamatune/trainer.py", line 25, in train
self.model_engine.train(data_module=self.data_module)
File "/usr/local/lib/python3.10/dist-packages/llamatune/model_engines/llama_model_engine.py", line 33, in train
trainer = Trainer(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 405, in init
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed bitsandbytes>=0.41.1.
steps to reproduce 1) start a runpod container with the pytorch 2.01 template and lots of disk space 2) run your sample command on a properly formatted dataset: python -m llamatune.train \ --model_name meta-llama/Llama-2-13b-chat-hf \ --data_path master_qa.json \ --training_recipe lora \ --batch_size 8 \ --gradient_accumulation_steps 4 \ --learning_rate 1e-4 \ --output_dir chat_llama2_13b \ --use_auth_token xxxzzz 3) result is: Model ready for training! trainable params: 250347520 || all params: 6922337280 || trainable: 3.616517223500557 WARNING:root:Loading data... WARNING:root:Tokenizing inputs... This may take some time... config TrainingConfig(model_name='meta-llama/Llama-2-13b-chat-hf', data_path='master_qa.json', output_dir='chat_llama2_13b', training_recipe='lora', optim='paged_adamw_8bit', batch_size=8, gradient_accumulation_steps=4, n_epochs=3, weight_decay=0.0, learning_rate=0.0001, max_grad_norm=0.3, gradient_checkpointing=True, do_train=True, lr_scheduler_type='cosine', warmup_ratio=0.03, logging_steps=1, group_by_length=True, save_strategy='epoch', save_total_limit=3, fp16=True, tokenizer_type='llama', trust_remote_code=False, compute_dtype=torch.float16, max_tokens=4096, do_eval=True, evaluation_strategy='epoch', use_auth_token='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', use_fast=False, bits=4, double_quant=True, quant_type='nf4', lora_r=64, lora_alpha=16, lora_dropout=0.0) Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/llamatune/train.py", line 50, in
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/llamatune/trainer.py", line 25, in train
self.model_engine.train(data_module=self.data_module)
File "/usr/local/lib/python3.10/dist-packages/llamatune/model_engines/llama_model_engine.py", line 33, in train
trainer = Trainer(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 405, in init
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed
bitsandbytes>=0.41.1
.