Open arnocandel opened 1 year ago
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b &> dolphin.7b.txt
0%| | 5/29155 [00:43<72:18:22, 8.93s/it]
CUDA_VISIBLE_DEVICES=0,1,2 torchrun --nproc_per_node=3 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --train_8bit=True &> dolphin.7b.txt
0%| | 14/31099 [04:12<153:34:34, 17.79s/it]
CUDA_VISIBLE_DEVICES=0,1,2 torchrun --nproc_per_node=3 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --train_4bit=True &> dolphin.7b.txt
0%| | 1/31099 [00:19<167:47:13, 19.42s/it]
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --num_epochs=0.2 --lora_target_modules='["query_key_value", "dense_h_to_4h", "dense_4h_to_h", "dense"]' &> dolphin.7b.txt
OOM on 2x48GB
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --num_epochs=0.2 &> dolphin.7b.txt
OOM too, so 2k is too much
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=1536 --drop_truncations=True --num_epochs=0.2 &> dolphin.7b.txt
OOM
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --num_epochs=0.2 --train_8bit=True
Probably overfit on instructions:
CUDA_VISIBLE_DEVICES=2 python generate.py --base_model=tiiuae/falcon-7b --lora_weight=falcon-7b.ehartforddolphin.0.2_epochs.0b8d30ad31bcb7762468f8f5fa6c46f04451caad.0/checkpoint-2296 --load_8bit=True --prompt_type=human_bot
https://slack-files.com/T0329MHH6-F05G3HMMB32-5ccc51d96c lora checkpoints and logs
again without overfitting on instructions:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --num_epochs=0.2 --train_8bit=True --train_on_inputs=False &> dolphin.7b.notraininstructions.log
dolphin.7b.notraininstructions.log
https://slack-files.com/T0329MHH6-F05FWQ0LRUP-ea1604b9b8 logs and checkpoints/lora weights
130 GPU-hours
CUDA_VISIBLE_DEVICES=2 python generate.py --base_model=tiiuae/falcon-7b --lora_weight=falcon-7b.ehartforddolphin.0.2_epochs.c432387e2099171f2332a0da1126103fc549cba7.0 --load_8bit=True --prompt_type=human_bot
again with --prompt_type=human_bot
, but without cleaning up data yet, just for quick sanity-check:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --prompt_type=human_bot --num_epochs=0.01 --train_8bit=True --train_on_inputs=False &> dolphin.7b.human_bot.log
all good. Now doing 0.1 epochs, still with instructions (system prompt), even thought might not need/want, also no personalization (yet):
PYTHONPATH=. CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --prompt_type=human_bot --num_epochs=0.1 --train_8bit=True --train_on_inputs=False &> dolphin.7b.human_bot.log
PYTHONPATH=. CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=ehartford/dolphin --base_model=tiiuae/falcon-7b --cutoff_len=2048 --drop_truncations=True --prompt_type=human_bot --num_epochs=1 --train_on_inputs=False --train_4bit=True &> dolphin.7b.human_bot.log
histograms of byte lengths of instruction/input/output for https://huggingface.co/datasets/ehartford/dolphin