hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.89k stars 4.17k forks source link

Converting format of dataset (num_proc=15): 100%|██████████| 15/15 [00:00<00:00, 74.50 examples/s]卡住 #5782

Open alf-wangzhi opened 2 weeks ago

alf-wangzhi commented 2 weeks ago

Reminder

System Info

absl-py 2.1.0 accelerate 0.34.2 aiofiles 23.2.1 aiohttp 3.9.1 aiosignal 1.3.1 annotated-types 0.6.0 anyio 4.6.2.post1 apex 0.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 asttokens 2.4.1 astunparse 1.6.3 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 av 13.1.0 beautifulsoup4 4.12.3 bleach 6.1.0 blis 0.7.11 cachetools 5.3.2 catalogue 2.0.10 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpathlib 0.16.0 cloudpickle 3.0.0 cmake 3.28.1 comm 0.2.1 confection 0.1.4 contourpy 1.2.0 cubinlinker 0.3.0+2.g405ac64 cuda-python 12.3.0rc4+9.gdb8c48a.dirty cudf 23.12.0 cugraph 23.12.0 cugraph-dgl 23.12.0 cugraph-service-client 23.12.0 cugraph-service-server 23.12.0 cuml 23.12.0 cupy-cuda12x 12.3.0 cycler 0.12.1 cymem 2.0.8 Cython 3.0.8 dask 2023.11.0 dask-cuda 23.12.0 dask-cudf 23.12.0 datasets 2.21.0 debugpy 1.8.1 decorator 5.1.1 deepspeed 0.14.4 defusedxml 0.7.1 dill 0.3.8 diskcache 5.6.3 distributed 2023.11.0 distro 1.9.0 dm-tree 0.1.8 docstring_parser 0.16 einops 0.7.0 exceptiongroup 1.2.0 execnet 2.0.2 executing 2.0.1 expecttest 0.1.3 fastapi 0.115.2 fastjsonschema 2.19.1 fastrlock 0.8.2 ffmpy 0.4.0 filelock 3.13.1 fire 0.7.0 fonttools 4.48.1 frozenlist 1.4.1 fsspec 2023.12.2 gast 0.5.4 gguf 0.10.0 google-auth 2.27.0 google-auth-oauthlib 0.4.6 gradio 4.44.1 gradio_client 1.3.0 graphsurgeon 0.4.6 grpcio 1.60.1 h11 0.14.0 hjson 3.1.0 httpcore 1.0.6 httptools 0.6.4 httpx 0.27.2 huggingface-hub 0.26.1 hypothesis 5.35.1 idna 3.6 importlib-metadata 7.0.1 importlib_resources 6.4.5 iniconfig 2.0.0 intel-openmp 2021.4.0 interegular 0.3.3 ipykernel 6.29.2 ipython 8.21.0 ipython-genutils 0.2.0 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.3 jiter 0.6.1 joblib 1.3.2 json5 0.9.14 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyter-tensorboard 0.2.0 jupyterlab 2.3.2 jupyterlab_pygments 0.3.0 jupyterlab-server 1.2.0 jupytext 1.16.1 kiwisolver 1.4.5 langcodes 3.3.0 lark 1.2.2 lazy_loader 0.3 librosa 0.10.1 llamafactory 0.9.1.dev0 /app llvmlite 0.43.0 lm-format-enforcer 0.10.6 locket 1.0.0 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.4 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.4.0 mdurl 0.1.2 mistral_common 1.4.4 mistune 3.0.2 mkl 2021.1.1

Reproduction

llamafactory-cli train \ --stage sft \ --do_train True \ --model_name_or_path /data1/models/Qwen2___5-7B-Instruct \ --preprocessing_num_workers 16 \ --finetuning_type full \ --template qwen \ --flash_attn auto \ --dataset_dir /data1/data/data \ --dataset test \ --cutoff_len 1024 \ --learning_rate 5e-05 \ --num_train_epochs 1.0 \ --max_samples 100000 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --max_grad_norm 1.0 \ --logging_steps 5 \ --save_steps 100 \ --warmup_steps 0 \ --optim adamw_torch \ --packing False \ --report_to none \ --output_dir /data1/wangzhiqiang/zhapian/models/sft1 \ --bf16 True \ --plot_loss True \ --ddp_timeout 180000000 \ --include_num_input_tokens_seen True

Expected behavior

image 卡住 数据格式 image datasetinfo image

Others

No response

Th4p4 commented 1 week ago

@alf-wangzhi did you get any workaround for this?