modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.16k stars 368 forks source link

Process hang with futex(0x7f403c0199d0, FUTEX_WAIT, 14826, NULL #1128

Closed QiMingChina closed 2 weeks ago

QiMingChina commented 5 months ago

硬件信息: image

环境信息: Python 3.11.9 torch 2.1.1 CUDA 12.1 ms-swift 2.2.0.dev0

执行脚本:

CUDA_VISIBLE_DEVICES=1 swift sft \
    --model_type chinese-llama-2-7b \
    --model_id_or_path /data/public/qim/model/chinese-llama-2-7b \
    --dataset school-math-zh \
    --output_dir /data/public/qim/script/swift/sft \

现象: image 卡在这里长时间没反应

查看进程 image

追踪进程: strace -p 14770 image strace -p 9471 image

tastelikefeet commented 2 months ago

这个问题比较奇怪,仍然能复现吗?

slin000111 commented 2 weeks ago

没有复现,可能是机器的原因