Closed arnocandel closed 1 year ago
torchrun --nproc_per_node=8 finetune.py --data_path=h2oai/h2ogpt-oig-oasst1-instruct-cleaned-v3 --data_mix_in_path=h2oai/h2ogpt-fortune2000-personalized --data_mix_in_factor=1.0 --data_mix_in_col_dict='{}' --data_mix_in_prompt_type=plain --drop_truncations=True --train_8bit=False --base_model=h2oai/h2ogpt-oasst1-512-12b --micro_batch_size=2 --batch_size=64 --num_epochs=1 --run_id=2 &> log.2.txt
1%| | 73/8195 [01:58<3:38:24, 1.61s/it]
8xA100 80GB
23%|██▎ | 1844/8195 [1:15:18<13:11:18, 7.48s/it] https://huggingface.co/h2oai/h2ogpt-oig-oasst1-512-12b/blob/main/h2ogpt-oasst1-512-12b.h2oaih2ogpt-oig-oasst1-instruct-cleaned-v3.1_epochs.805b8e8eff369207340a5a6f90f3c833f9731254.2.zip lora and logs
https://huggingface.co/h2oai/h2ogpt-oig-oasst1-512-12b
CUDA_VISIBLE_DEVICES=0 python main.py --model hf-causal-experimental --model_args pretrained=h2oai/h2ogpt-oig-oasst1-512-12b --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> h2ogpt-oig-oasst1-512-12b.eval.2.log
h2ogpt-oig-oasst1-512-12b.eval.2.log
ShareGPT eval before: avg 0.47 median 0.43 https://github.com/h2oai/h2ogpt/issues/73#issuecomment-1518907557
CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-12b --prompt_type='human_bot' --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=500 --eval_sharegpt_as_output=False --num_beams=1 &> h2ogpt-oig-oasst1-512-12b.eval.log
h2ogpt-oig-oasst1-512-12b.eval.log
indeed much worse than before, confirmed again here:
CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b --prompt_type='human_bot' --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=500 --eval_sharegpt_as_output=False --num_beams=1
torchrun --nproc_per_node=8 finetune.py --val_set_size=0 --eval_steps=32000 finetune.py --data_path=h2oai/h2ogpt-oig-oasst1-instruct-cleaned-v3 --data_mix_in_path=h2oai/h2ogpt-fortune2000-personalized --data_mix_in_factor=1.0 --data_mix_in_col_dict='{}' --data_mix_in_prompt_type=plain --drop_truncations=True --train_8bit=True --base_model=h2oai/h2ogpt-oasst1-512-20b --micro_batch_size=8 --batch_size=64 --num_epochs=1 --run_id=3 &> log.3.txt
https://huggingface.co/h2oai/h2ogpt-oig-oasst1-512-20b/ has lora weights, not evaluated yet, but likely bad too, since too much pretraining vs fine-tuning.
torchrun --nproc_per_node=2 finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt_graded --drop_truncations=True --train_8bit=False --base_model=EleutherAI/pythia-12b-deduped --micro_batch_size=4 --batch_size=128 --num_epochs=3 --run_id=4 &> log.4.txt
0%| | 4/990 [00:25<1:44:47, 6.38s/it]
2xA6000Ada
log.4.txt
Eval to confirm all good:
(env) arno@rippa:/nfs4/llm/h2ogpt(main)$ CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b --prompt_type='human_bot' --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=500 --eval_sharegpt_as_output=False --num_beams=2 &> h2ogpt-oasst1-512-12b.eval.log
h2ogpt-oasst1-512-12b.eval.log
(env) arno@rippa:/nfs4/llm/lm-evaluation-harness(master)$ CUDA_VISIBLE_DEVICES=1 python main.py --model hf-causal-experimental --model_args pretrained=h2oai/h2ogpt-oasst1-512-12b --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> h2ogpt-oasst1-512-12b.eval.log
h2ogpt-oasst1-512-12b.eval.log
torchrun --nproc_per_node=8 finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt_graded --drop_truncations=True --train_8bit=False --base_model=EleutherAI/gpt-neox-20b --micro_batch_size=1 --batch_size=128 --num_epochs=3 --run_id=5 &> log.5.txt
0%| | 2/495 [00:12<50:48, 6.18s/it]
8xA100 80GB
log.5.txt
(env) arno@rippa:/nfs4/llm/h2ogpt(main)$ CUDA_VISIBLE_DEVICES=0,1 python generate.py --base_model=h2ogpt-oasst1-512-20b --prompt_type='human_bot' --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=500 --eval_sharegpt_as_output=False --num_beams=2 --infer_devices=False &> h2ogpt-oasst1-512-20b.eval.log
h2ogpt-oasst1-512-20b.eval.log
got worse, was 0.46/0.44 mean/median before https://github.com/h2oai/h2ogpt/issues/127
next iteration will likely depend on reimplementation of WizardLM
6.9B additional fine-tuning on 500k rows
CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node=2 finetune.py --data_path=h2oai/h2ogpt-oig-oasst1-instruct-cleaned-v3 --data_mix_in_path=h2oai/h2ogpt-fortune2000-personalized --data_mix_in_factor=1.0 --data_mix_in_col_dict='{}' --data_mix_in_prompt_type=plain --drop_truncations=True --train_8bit=False --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --micro_batch_size=2 --batch_size=64 --num_epochs=1 --run_id=1 &> log.1.txt
https://slack-files.com/T0329MHH6-F056R5NHRPG-0d2ba25c7a lora weights and logs (~8 hours on 2xA6000 Ada)
CUDA_VISIBLE_DEVICES=0 torchrun main.py --model hf-causal --model_args pretrained=../h2ogpt/h2ogpt-oig-oasst1-512-6.9b --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> h2ogpt-oig-oasst1-512-6.9b.eval.log
similar to https://github.com/h2oai/h2ogpt/issues/35#issuecomment-1520876120, just slightly worse, but now much![image_720](https://github.com/h2oai/h2ogpt/assets/6147661/a6fd40a2-889c-44e0-9f84-6f8ef6958854)
CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --lora_weights=h2ogpt-oig-oasst1-512-6.9b.h2oaih2ogpt-oig-oasst1-instruct-cleaned-v3.1_epochs.e48f9debb0d2bd8d866fa5668bbbb51c317c553c.1
ShareGPT eval before: avg 0.43 median 0.37 https://github.com/h2oai/h2ogpt/issues/73#issuecomment-1518699369
After:![df_scores_100_100_1234_False_h2ogpt-oig-oasst1-512-6 9b_](https://github.com/h2oai/h2ogpt/assets/6147661/727b4361-492e-466e-8f1e-8373b6cf6973)
CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --prompt_type='human_bot' --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=100 --eval_sharegpt_as_output=False --num_beams=1 &> h2ogpt-oig-oasst1-512-6.9b.eval.2.log
h2ogpt-oig-oasst1-512-6.9b.eval.2.log