Qlora+FSDP+unsloth 训练，系统提示错误：[rank3]: raise RuntimeError('Unsloth currently does not support multi GPU setups - but we are working on it!') [rank3]: RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

camposs1979 commented 1 month ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

accelerate 0.32.0 addict 2.4.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 aliyun-python-sdk-core 2.15.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 anyio 4.3.0 async-timeout 4.0.3 attrs 23.2.0 auto_gptq 0.7.1 av 13.1.0 bitsandbytes 0.44.1 certifi 2019.11.28 cffi 1.16.0 chardet 3.0.4 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 cmake 3.29.2 coloredlogs 15.0.1 contourpy 1.2.0 crcmod 1.7 cryptography 42.0.5 cupy-cuda12x 12.1.0 cycler 0.12.1 datasets 2.20.0 dbus-python 1.2.16 deepspeed 0.14.4 dill 0.3.8 diskcache 5.6.3 distro 1.9.0 distro-info 0.23ubuntu1 docstring_parser 0.16 einops 0.7.0 exceptiongroup 1.2.0 fastapi 0.110.0 fastrlock 0.8.2 ffmpy 0.3.2 filelock 3.13.3 fire 0.6.0 fonttools 4.50.0 frozenlist 1.4.1 fsspec 2024.2.0 galore-torch 1.0 gast 0.5.4 gekko 1.0.7 gradio 4.29.0 gradio_client 0.16.1 h11 0.14.0 hf_transfer 0.1.8 hjson 3.1.0 httpcore 1.0.4 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.3 humanfriendly 10.0 idna 2.8 importlib_metadata 7.1.0 importlib_resources 6.4.0 interegular 0.3.3 jieba 0.42.1 Jinja2 3.1.3 jmespath 0.10.0 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lark 1.1.9 liger_kernel 0.3.0 llamafactory 0.9.1.dev0 /hy-tmp/LLaMA-Factory-main llvmlite 0.42.0 lm-format-enforcer 0.10.1 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.3 mdurl 0.1.2 modelscope 1.13.3 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.2.1 ninja 1.11.1.1 nltk 3.9.1 numba 0.59.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.4.99 nvidia-nvtx-cu12 12.1.105 openai 1.34.0 optimum 1.16.0 orjson 3.9.15 oss2 2.18.4 outlines 0.0.34 packaging 24.0 pandas 2.2.1 peft 0.12.0 pillow 10.2.0 pip 24.2 platformdirs 4.2.0 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 protobuf 3.20.3 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.2 pyarrow-hotfix 0.6 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.4 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 PyGObject 3.36.0 pynvml 11.5.0 pyparsing 3.1.2 python-apt 2.0.1+ubuntu0.20.4.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 ray 2.10.0 referencing 0.34.0 regex 2023.12.25 requests 2.32.3 requests-unixsocket 0.2.0 rich 13.7.1 rouge 1.0.1 rouge-chinese 1.0.3 rpds-py 0.18.0 ruff 0.4.6 safetensors 0.4.5 scipy 1.12.0 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.2.0 shellingham 1.5.4 shtab 1.7.1 simplejson 3.19.2 six 1.14.0 sniffio 1.3.1 sortedcontainers 2.4.0 sse-starlette 2.0.0 ssh-import-id 5.10 starlette 0.36.3 sympy 1.12 termcolor 2.4.0 tiktoken 0.6.0 tokenizers 0.19.1 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.4.1 tqdm 4.66.5 transformers 4.43.4 triton 3.0.0 trl 0.9.6 typer 0.12.3 typing_extensions 4.10.0 tyro 0.7.3 tzdata 2024.1 unattended-upgrades 0.1 unsloth 2024.9.post4 urllib3 2.2.1 uvicorn 0.29.0 uvloop 0.19.0 vllm 0.4.3 vllm-flash-attn 2.5.8.post2 watchfiles 0.21.0 websockets 11.0.3 wheel 0.43.0 xformers 0.0.28.post1 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.18.1

Reproduction

model

model_name_or_path: /hy-tmp/LLaMA-Factory-main/model/Qwen/Qwen2.5-32B-Instruct/ quantization_bit: 4

method

stage: sft do_train: true finetuning_type: lora lora_target: all lora_rank: 8 lora_alpha: 16

long-lora

shift_attn: true enable_liger_kernel: true use_unsloth_gc: true flash_attn: auto use_unsloth: true

dataset

dataset: Qwen25-0013 template: qwen cutoff_len: 204800 max_new_tokens: 8192 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/Qwen2.5-32B-Instruct/lora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 2 learning_rate: 1.0e-6 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 180000000

.... [rank3]: raise RuntimeError('Unsloth currently does not support multi GPU setups - but we are working on it!') [rank3]: RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

Expected behavior

单机多卡的超长文本训练可以利用unsloth正常进行。

从系统提示错误来看，unsloth 在多卡的模式下不能工作。搜索过unsloth的相关资料来看，有说可以在多卡模式下工作的，有说不能在多卡模式下工作的，总说纷纭，搞得很迷惑，也请教各位大佬和作者，有用unsloth + FSDP + QLORA成功训练过的么？

Others

No response

hiyouga commented 1 month ago

目前 unsloth 开源版本不支持多卡训练，可关闭 unsloth 或支持商业版 unsloth

YingxuanW commented 1 month ago

请问现在unsloth支持了多卡了嘛？

hiyouga / LLaMA-Factory