ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7.01k stars 571 forks source link

为什么在指令微调(run_sft.sh)给定少量数据去训练了,合并模型之后还是没有效果? #472

Closed Kris-rod closed 6 months ago

Kris-rod commented 7 months ago

提交前必须检查以下项目

问题类型

模型训练与精调

基础模型

Chinese-Alpaca-2 (7B/13B)

操作系统

Linux

详细描述问题

下面是我模型训练的参数以及命令情况,我只用了十多条json去训练模型的名字,训练完合并了模型还是没有效果。

lr=1e-4 lora_rank=16 lora_alpha=16 lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj" modules_to_save="embed_tokens,lm_head" lora_dropout=0.05 pretrained_model="/home/ubuntu/LLama/chinese-alpaca-2-7b-hf" chinese_tokenizer_path="/home/ubuntu/LLama/chinese-alpaca-2-7b-hf" dataset_dir="/home/ubuntu/LLama/ih_test" per_device_train_batch_size=1 per_device_eval_batch_size=1 gradient_accumulation_steps=8 max_seq_length=512 output_dir="/home/ubuntu/LLama/finetuning-model"

peft_model="/home/ubuntu/LLama/Chinese-LLaMA-Alpaca-2/scripts/training/peft"

validation_file="/home/ubuntu/LLama/ih_test/valid.json"

deepspeed_config_file=ds_zero2_no_offload.json

torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \ --deepspeed ${deepspeed_config_file} \ --model_name_or_path ${pretrained_model} \ --tokenizer_name_or_path ${chinese_tokenizer_path} \ --dataset_dir ${dataset_dir} \ --per_device_train_batch_size ${per_device_train_batch_size} \ --per_device_eval_batch_size ${per_device_eval_batch_size} \ --do_train \ --do_eval \ --seed $RANDOM \ --fp16 \ --num_train_epochs 1 \ --lr_scheduler_type cosine \ --learning_rate ${lr} \ --warmup_ratio 0.03 \ --weight_decay 0 \ --logging_strategy steps \ --logging_steps 10 \ --save_strategy steps \ --save_total_limit 3 \ --evaluation_strategy steps \ --eval_steps 100 \ --save_steps 200 \ --gradient_accumulation_steps ${gradient_accumulation_steps} \ --preprocessing_num_workers 8 \ --max_seq_length ${max_seq_length} \ --output_dir ${output_dir} \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --lora_rank ${lora_rank} \ --lora_alpha ${lora_alpha} \ --trainable ${lora_trainable} \ --lora_dropout ${lora_dropout} \ --torch_dtype float16 \ --validation_file ${validation_file} \ --load_in_kbits 8 \ --save_safetensors False \ --gradient_checkpointing \ --ddp_find_unused_parameters False

依赖情况(代码类问题务必提供)

accelerate 0.24.1 aiofiles 23.2.1 aiohttp 3.9.1 aiosignal 1.3.1 altair 5.2.0 annotated-types 0.6.0 anyio 3.7.1 async-timeout 4.0.3 attrs 23.1.0 bitsandbytes 0.41.1 Brotli 1.1.0 certifi 2023.11.17 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cmake 3.27.7 colorama 0.4.6 contourpy 1.1.1 cryptography 41.0.7 cycler 0.12.1 dataclasses 0.8 dataclasses-json 0.6.3 datasets 2.15.0 deepspeed 0.12.3 dill 0.3.7 exceptiongroup 1.2.0 fastapi 0.104.1 ffmpy 0.3.1 filelock 3.13.1 fonttools 4.45.1 frozenlist 1.4.0 fsspec 2023.10.0 gradio 3.50.0 gradio_client 0.6.1 greenlet 3.0.1 h11 0.14.0 hjson 3.1.0 httpcore 1.0.2 httpx 0.25.2 huggingface-hub 0.19.4 idna 3.6 importlib-metadata 6.8.0 importlib-resources 6.1.1 Jinja2 3.1.2 joblib 1.3.2 jsonpatch 1.33 jsonpointer 2.4 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 kiwisolver 1.4.5 langchain 0.0.344 langchain-core 0.0.8 langsmith 0.0.67 lit 17.0.6 markdown-it-py 3.0.0 MarkupSafe 2.1.3 marshmallow 3.20.1 matplotlib 3.7.4 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 mypy-extensions 1.0.0 networkx 3.1 ninja 1.11.1.1 numpy 1.24.4 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu11 2.19.3 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu12 12.1.105 orjson 3.9.10 packaging 23.2 pandas 2.0.3 peft 0.5.0 Pillow 10.1.0 pip 23.3.1 pkgutil_resolve_name 1.3.10 psutil 5.9.6 py-cpuinfo 9.0.0 pyarrow 14.0.1 pyarrow-hotfix 0.6 pycparser 2.21 pydantic 2.5.2 pydantic_core 2.14.5 pydub 0.25.1 Pygments 2.17.2 pynvml 11.5.0 pyOpenSSL 23.3.0 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.3.post1 PyYAML 6.0.1 referencing 0.31.1 regex 2023.10.3 requests 2.31.0 rich 13.7.0 rpds-py 0.13.2 sacremoses 0.0.53 safetensors 0.3.3 scikit-learn 1.3.2 scipy 1.10.1 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 68.2.2 shellingham 1.5.4 six 1.16.0 sniffio 1.3.0 SQLAlchemy 2.0.23 starlette 0.27.0 sympy 1.12 tenacity 8.2.3 threadpoolctl 3.2.0 tokenizers 0.14.1 tomlkit 0.12.0 toolz 0.12.0 torch 2.1.1 tornado 6.4 tqdm 4.66.1 transformers 4.34.0 triton 2.1.0 typer 0.9.0 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 urllib3 2.1.0 uvicorn 0.24.0.post1 visdom 0.2.4 websocket-client 1.7.0 websockets 11.0.3 wheel 0.42.0 xxhash 3.4.1 yarl 1.9.3 zipp 3.17.0

运行日志或截图

1

qaz2709 commented 7 months ago

你把epoch数增加到5试试。你看你的训练loss都明显没收敛

Kris-rod commented 7 months ago

你把epoch数增加到5试试。你看你的训练loss都明显没收敛

已经解决问题,十分感激!

cklogic commented 7 months ago

您的数据集长啥样,可以给我参考参考吗,万分感谢。

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 6 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.