Closed adam-mhd94 closed 7 months ago
There may be cases of underfitting.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
There may be cases of underfitting.
Thank you. Due to the 16GB memory(each GPU), I cannot increase the batch size. Could the issue possibly be due to a very small batch size?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
Check before submitting issues
Type of Issue
Model training and fine-tuning
Base Model
Chinese-LLaMA-2 (7B/13B)
Operating System
Linux
Describe your issue in detail
I intend to fine-tune the Lama 7 model with non-Chinese data. Training the model on large data with the original Lama tokenizer yields good results. However, when I use a tokenizer tailored for my language, the loss increases significantly, and the model performs very poorly. For example, it keeps repeating a single word or char.
GPUs: 6 16GB T4 I am training the model in a multi-GPU mode.
运行脚本前请仔细阅读wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh)
Read the wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh) carefully before running the script
lr=2e-4 lora_rank=64 lora_alpha=128 lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj" modules_to_save="embed_tokens,lm_head" lora_dropout=0.05
per_device_train_batch_size=1 gradient_accumulation_steps=1 block_size=32
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --nnodes 1 --nproc_per_node 6 --master_port 5896 run_clm_pt_with_peft.py \ --deepspeed ${deepspeed_config_file} \ --model_name_or_path ${pretrained_model} \ --tokenizer_name_or_path ${pretrained_model} \ --dataset_dir ${dataset_dir} \ --data_cache_dir ${data_cache} \ --validation_split_percentage 0.001 \ --per_device_train_batch_size ${per_device_train_batch_size} \ --do_train \ --seed $RANDOM \ --num_train_epochs 1 \ --lr_scheduler_type cosine \ --learning_rate ${lr} \ --warmup_ratio 0.05 \ --weight_decay 0.01 \ --logging_strategy steps \ --logging_steps 10 \ --save_strategy steps \ --save_total_limit 2 \ --save_steps 200 \ --gradient_accumulation_steps ${gradient_accumulation_steps} \ --preprocessing_num_workers 16 \ --block_size ${block_size} \ --output_dir ${output_dir} \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --lora_rank ${lora_rank} \ --lora_alpha ${lora_alpha} \ --trainable ${lora_trainable} \ --lora_dropout ${lora_dropout} \ --modules_to_save ${modules_to_save} \ --torch_dtype float32 \ --load_in_kbits 8 \ --save_safetensors False \ --gradient_checkpointing \ --ddp_find_unused_parameters False \
Dependencies (must be provided for code-related issues)
accelerate==0.27.2 aiofiles==23.2.1 aiohttp==3.9.3 aiosignal==1.3.1 altair==5.2.0 anyio==4.3.0 appdirs==1.4.4 async-timeout==4.0.3 attrs==23.2.0 bitsandbytes==0.41.1 certifi==2024.2.2 charset-normalizer==3.3.2 click==8.1.7 contourpy==1.2.0 cycler==0.12.1 datasets==2.14.5 deepspeed==0.11.0 dill==0.3.7 docker-pycreds==0.4.0 exceptiongroup==1.2.0 fastapi==0.109.2 ffmpy==0.3.2 filelock==3.13.1 fire==0.5.0 fonttools==4.49.0 frozenlist==1.4.1 fsspec==2023.6.0 gitdb==4.0.11 GitPython==3.1.42 gradio==3.50.2 gradio_client==0.6.1 h11==0.14.0 hjson==3.1.0 httpcore==1.0.3 httpx==0.26.0 huggingface-hub==0.17.3 idna==3.6 importlib-resources==6.1.1 Jinja2==3.1.2 joblib==1.3.2 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 kiwisolver==1.4.5 MarkupSafe==2.1.3 matplotlib==3.8.3 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.15 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.4 nvidia-cublas-cu11==11.11.3.6 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cudnn-cu11==8.7.0.84 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.3.0.86 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusparse-cu11==11.7.5.86 nvidia-nccl-cu11==2.19.3 nvidia-nvtx-cu11==11.8.86 orjson==3.9.14 packaging==23.2 pandas==2.2.0 pathtools==0.1.2 peft==0.3.0 pillow==10.2.0 protobuf==4.25.3 psutil==5.9.8 py-cpuinfo==9.0.0 pyarrow==15.0.0 pydantic==1.10.14 pydub==0.25.1 pyparsing==3.1.1 python-dateutil==2.8.2 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 referencing==0.33.0 regex==2023.12.25 requests==2.31.0 rpds-py==0.18.0 safetensors==0.4.2 scikit-learn==1.4.1.post1 scipy==1.11.1 semantic-version==2.10.0 sentencepiece==0.1.99 sentry-sdk==1.40.5 setproctitle==1.3.3 six==1.16.0 smmap==5.0.1 sniffio==1.3.0 starlette==0.36.3 sympy==1.12 termcolor==2.4.0 threadpoolctl==3.3.0 tokenizers==0.14.1 toolz==0.12.1 torch==2.2.0+cu118 torchaudio==2.2.0+cu118 torchvision==0.17.0+cu118 tqdm==4.66.2 transformers==4.34.0 triton==2.2.0 typing_extensions==4.9.0 tzdata==2024.1 urllib3==2.2.1 uvicorn==0.27.1 wandb==0.15.12 websockets==11.0.3 xxhash==3.4.1 yarl==1.9.4
Execution logs or screenshots
The model's output is such that it continuously repeats a word and is completely meaningless. Do you know where the problem might be coming from?