GLM4 sft跑到 epoch 0.45的时候，train loss 开始就是0了

Reminder

[X] I have read the README and searched the existing issues.

System Info

Package Version Editable project location

accelerate 0.30.1 aiofiles 23.2.1 aiohttp 3.9.5 aiosignal 1.3.1 altair 5.3.0 annotated-types 0.6.0 anyio 4.3.0 async-timeout 4.0.3 attrs 23.2.0 auto_gptq 0.7.1 bitsandbytes 0.43.1 blinker 1.8.2 cachetools 5.3.3 certifi 2024.2.2 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 cmake 3.29.3 coloredlogs 15.0.1 contourpy 1.2.1 cycler 0.12.1 dataclasses-json 0.6.6 datasets 2.19.2 deepdiff 7.0.1 deepspeed 0.14.0 dill 0.3.7 diskcache 5.6.3 distro 1.9.0 dnspython 2.6.1 docstring_parser 0.16 einops 0.8.0 email_validator 2.1.1 exceptiongroup 1.2.1 fastapi 0.111.0 fastapi-cli 0.0.3 ffmpy 0.3.2 filelock 3.14.0 fire 0.6.0 fonttools 4.51.0 frozenlist 1.4.1 fsspec 2024.3.1 gekko 1.1.1 gitdb 4.0.11 GitPython 3.1.43 gradio 4.31.3 gradio_client 0.16.3 greenlet 3.0.3 h11 0.14.0 hjson 3.1.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.0 humanfriendly 10.0 idna 3.7 importlib_resources 6.4.0 interegular 0.3.3 jieba 0.42.1 Jinja2 3.1.4 joblib 1.4.2 jsonpatch 1.33 jsonpointer 2.4 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 langchain 0.1.20 langchain-community 0.0.38 langchain-core 0.1.52 langchain-text-splitters 0.0.2 langsmith 0.1.59 lark 1.1.9 llamafactory 0.7.2.dev0 /root/LLaMA-Factory llmtuner 0.7.2.dev0 /root/LLaMA-Factory llvmlite 0.42.0 lm-format-enforcer 0.10.1 markdown-it-py 3.0.0 MarkupSafe 2.1.5 marshmallow 3.21.2 matplotlib 3.9.0 mdurl 0.1.2 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.15 mypy-extensions 1.0.0 nest-asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 nltk 3.8.1 numba 0.59.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.550.52 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 openai 1.30.1 optimum 1.20.0 ordered-set 4.1.0 orjson 3.10.3 outlines 0.0.34 packaging 23.2 pandas 2.2.2 peft 0.11.1 pillow 10.3.0 pip 24.0 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 protobuf 4.25.3 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 16.1.0 pyarrow-hotfix 0.6 pydantic 2.7.1 pydantic_core 2.18.2 pydeck 0.9.1 pydub 0.25.1 Pygments 2.18.0 pynvml 11.5.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 ray 2.22.0 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 rich 13.7.1 rouge 1.0.1 rouge-chinese 1.0.3 rpds-py 0.18.1 ruff 0.4.4 safetensors 0.4.3 schedule 1.2.1 scipy 1.13.0 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.5.1 shellingham 1.5.4 shtab 1.7.1 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 socksio 1.0.0 SQLAlchemy 2.0.30 sse-starlette 2.1.0 starlette 0.37.2 streamlit 1.34.0 sympy 1.12 tenacity 8.3.0 termcolor 2.4.0 tiktoken 0.6.0 tokenizers 0.19.1 toml 0.10.2 tomlkit 0.12.0 toolz 0.12.1 torch 2.3.0 tornado 6.4 tqdm 4.66.4 transformers 4.41.2 transformers-stream-generator 0.0.5 triton 2.3.0 trl 0.8.6 typer 0.12.3 typing_extensions 4.11.0 typing-inspect 0.9.0 tyro 0.8.4 tzdata 2024.1 ujson 5.10.0 urllib3 2.2.1 uvicorn 0.29.0 uvloop 0.19.0 vllm 0.4.3 vllm-flash-attn 2.5.8.post2 vllm_nccl_cu12 2.18.1.0.4.0 watchdog 4.0.0 watchfiles 0.21.0 websockets 11.0.3 wheel 0.43.0 xformers 0.0.26.post1 xxhash 3.4.1 yarl 1.9.4

Reproduction

llamafactory-cli train --stage sft --do_train True --model_name_or_path THUDM/glm-4-9b-chat --finetuning_type lora --template glm4 --flash_attn auto --use_unsloth False --dataset_dir data --dataset xxxx --cutoff_len 1024 --learning_rate 3e-4 --num_train_epochs 2 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to none --output_dir saves/THUDM/glm-4-9b-chat/395/lora/train_20240605161602 --fp16 True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target all --val_size 0.10 --evaluation_strategy steps --eval_steps 100 --per_device_eval_batch_size 1 --load_best_model_at_end True --preprocessing_num_workers 32 --plot_loss True --overwrite_cache True --ddp_timeout 180000000

Expected behavior

Others

No response

hiyouga / LLaMA-Factory