hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
35.22k stars 4.35k forks source link

Converting format of dataset (num_proc=15): 100%|██████████| 15/15 [00:00<00:00, 74.50 examples/s]卡住 #5782

Open alf-wangzhi opened 1 month ago

alf-wangzhi commented 1 month ago

Reminder

System Info

absl-py 2.1.0 accelerate 0.34.2 aiofiles 23.2.1 aiohttp 3.9.1 aiosignal 1.3.1 annotated-types 0.6.0 anyio 4.6.2.post1 apex 0.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 asttokens 2.4.1 astunparse 1.6.3 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 av 13.1.0 beautifulsoup4 4.12.3 bleach 6.1.0 blis 0.7.11 cachetools 5.3.2 catalogue 2.0.10 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpathlib 0.16.0 cloudpickle 3.0.0 cmake 3.28.1 comm 0.2.1 confection 0.1.4 contourpy 1.2.0 cubinlinker 0.3.0+2.g405ac64 cuda-python 12.3.0rc4+9.gdb8c48a.dirty cudf 23.12.0 cugraph 23.12.0 cugraph-dgl 23.12.0 cugraph-service-client 23.12.0 cugraph-service-server 23.12.0 cuml 23.12.0 cupy-cuda12x 12.3.0 cycler 0.12.1 cymem 2.0.8 Cython 3.0.8 dask 2023.11.0 dask-cuda 23.12.0 dask-cudf 23.12.0 datasets 2.21.0 debugpy 1.8.1 decorator 5.1.1 deepspeed 0.14.4 defusedxml 0.7.1 dill 0.3.8 diskcache 5.6.3 distributed 2023.11.0 distro 1.9.0 dm-tree 0.1.8 docstring_parser 0.16 einops 0.7.0 exceptiongroup 1.2.0 execnet 2.0.2 executing 2.0.1 expecttest 0.1.3 fastapi 0.115.2 fastjsonschema 2.19.1 fastrlock 0.8.2 ffmpy 0.4.0 filelock 3.13.1 fire 0.7.0 fonttools 4.48.1 frozenlist 1.4.1 fsspec 2023.12.2 gast 0.5.4 gguf 0.10.0 google-auth 2.27.0 google-auth-oauthlib 0.4.6 gradio 4.44.1 gradio_client 1.3.0 graphsurgeon 0.4.6 grpcio 1.60.1 h11 0.14.0 hjson 3.1.0 httpcore 1.0.6 httptools 0.6.4 httpx 0.27.2 huggingface-hub 0.26.1 hypothesis 5.35.1 idna 3.6 importlib-metadata 7.0.1 importlib_resources 6.4.5 iniconfig 2.0.0 intel-openmp 2021.4.0 interegular 0.3.3 ipykernel 6.29.2 ipython 8.21.0 ipython-genutils 0.2.0 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.3 jiter 0.6.1 joblib 1.3.2 json5 0.9.14 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyter-tensorboard 0.2.0 jupyterlab 2.3.2 jupyterlab_pygments 0.3.0 jupyterlab-server 1.2.0 jupytext 1.16.1 kiwisolver 1.4.5 langcodes 3.3.0 lark 1.2.2 lazy_loader 0.3 librosa 0.10.1 llamafactory 0.9.1.dev0 /app llvmlite 0.43.0 lm-format-enforcer 0.10.6 locket 1.0.0 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.4 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.4.0 mdurl 0.1.2 mistral_common 1.4.4 mistune 3.0.2 mkl 2021.1.1

Reproduction

llamafactory-cli train \ --stage sft \ --do_train True \ --model_name_or_path /data1/models/Qwen2___5-7B-Instruct \ --preprocessing_num_workers 16 \ --finetuning_type full \ --template qwen \ --flash_attn auto \ --dataset_dir /data1/data/data \ --dataset test \ --cutoff_len 1024 \ --learning_rate 5e-05 \ --num_train_epochs 1.0 \ --max_samples 100000 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --max_grad_norm 1.0 \ --logging_steps 5 \ --save_steps 100 \ --warmup_steps 0 \ --optim adamw_torch \ --packing False \ --report_to none \ --output_dir /data1/wangzhiqiang/zhapian/models/sft1 \ --bf16 True \ --plot_loss True \ --ddp_timeout 180000000 \ --include_num_input_tokens_seen True

Expected behavior

image 卡住 数据格式 image datasetinfo image

Others

No response

Th4p4 commented 4 weeks ago

@alf-wangzhi did you get any workaround for this?

yzqtdu commented 2 days ago

You might as well set a smaller number for preprocessing_num_workers if it's 128 or larger. It works for me