Quantization stops for no reason when GPTQ quantizing Qwen2-72b-Instruct

edgeinfinity-wzt commented 5 months ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) 在进行到78/80时, 量化无故中断, 我没有看到任何调试信息或错误. 该问题仅在qwen2-72b上出现, qwen1.5-72b和qwen2-7b均正常. 以下是我的量化参数: echo "BMC is:" read bmc CUDA_VISIBLE_DEVICES=0,1 swift export \ --ckpt_dir "$bmc" \ --quant_method gptq \ --quant_bits 4 \ --load_args_from_ckpt_dir True \ --load_dataset_config True \ --quant_device_map cpu \ --quant_n_samples 32 \ --quant_seqlen 32 \ --max_length 256 \ --dataset dataset/moni_dataset.jsonl

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等) Ubuntu24.04lts, 8x4090 pip freeze: absl-py==2.1.0 accelerate==0.30.0 addict==2.4.0 aiofiles==23.2.1 aiohttp==3.9.5 aioprometheus==23.12.0 aiosignal==1.3.1 aliyun-python-sdk-core==2.15.1 aliyun-python-sdk-kms==2.16.2 altair==5.3.0 annotated-types==0.6.0 anyio==4.3.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 arxiv==2.1.0 asttokens==2.4.1 async-lru==2.0.4 attrs==23.2.0 auto_gptq==0.7.1 autoawq==0.2.5 autoawq_kernels==0.0.6 Babel==2.15.0 beautifulsoup4==4.12.3 bitsandbytes==0.43.1 bleach==6.1.0 blinker==1.8.2 cachetools==5.3.3 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==3.0.0 cmake==3.29.2 colorama==0.4.6 coloredlogs==15.0.1 comm==0.2.2 contourpy==1.2.1 cpm-kernels==1.0.11 crcmod==1.7 cryptography==42.0.6 cupy-cuda12x==12.1.0 cycler==0.12.1 dacite==1.8.1 datasets==2.18.0 debugpy==1.8.1 decorator==5.1.1 decord==0.6.0 defusedxml==0.7.1 diffusers==0.25.0 dill==0.3.8 diskcache==5.6.3 distro==1.9.0 dnspython==2.6.1 docstring_parser==0.16 editdistance==0.8.1 einops==0.8.0 email_validator==2.1.1 et-xmlfile==1.1.0 eventlet==0.36.1 executing==2.0.1 fastapi==0.111.0 fastapi-cli==0.0.2 fastjsonschema==2.19.1 fastrlock==0.8.2 feedparser==6.0.10 ffmpy==0.3.2 filelock==3.14.0 fonttools==4.51.0 fqdn==1.5.1 frozenlist==1.4.1 fsspec==2024.2.0 func-timeout==4.3.5 gast==0.5.4 gekko==1.1.1 gitdb==4.0.11 GitPython==3.1.43 gradio==4.29.0 gradio_client==0.16.1 greenlet==3.0.3 griffe==0.45.3 grpcio==1.63.0 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.0 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 interegular==0.3.3 ipykernel==6.29.4 ipython==8.25.0 ipywidgets==8.1.3 isoduration==20.11.0 jedi==0.19.1 jieba==0.42.1 Jinja2==3.1.4 jmespath==0.10.0 joblib==1.4.2 json5==0.9.25 jsonlines==4.0.0 jsonpointer==3.0.0 jsonschema==4.22.0 jsonschema-specifications==2023.12.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.10.0 jupyter-lsp==2.2.5 jupyter_client==8.6.2 jupyter_core==5.7.2 jupyter_server==2.14.1 jupyter_server_terminals==0.5.3 jupyterlab==4.2.2 jupyterlab_pygments==0.3.0 jupyterlab_server==2.27.2 jupyterlab_widgets==3.0.11 kiwisolver==1.4.5 lagent==0.2.2 lark==1.1.9 llmuses==0.3.1 llvmlite==0.42.0 lm-format-enforcer==0.10.1 lxml==5.2.1 Markdown==3.6 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.8.4 matplotlib-inline==0.1.7 mdurl==0.1.2 mistune==3.0.2 mmengine==0.10.4 modelscope==1.14.0 mpmath==1.3.0 ms-swift==2.1.0 msgpack==1.0.8 multidict==6.0.5 multiprocess==0.70.16 nbclient==0.10.0 nbconvert==7.16.4 nbformat==5.10.4 nest-asyncio==1.6.0 networkx==3.3 ninja==1.11.1.1 nltk==3.8.1 notebook==7.2.1 notebook_shim==0.2.4 numba==0.59.1 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==12.550.52 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105 openai==1.28.1 opencv-python==4.10.0.82 openpyxl==3.1.3 optimum==1.19.1 orjson==3.10.3 oss2==2.18.5 outlines==0.0.34 overrides==7.7.0 packaging==24.0 pandas==2.2.2 pandocfilters==1.5.1 parso==0.8.4 PasteDeploy==3.1.0 peft==0.11.1 pexpect==4.9.0 phx-class-registry==4.1.0 pillow==10.3.0 platformdirs==4.2.1 plotly==5.22.0 ply==3.11 portalocker==2.8.2 prometheus-fastapi-instrumentator==7.0.0 prometheus_client==0.20.0 prompt_toolkit==3.0.47 protobuf==4.25.0 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==16.0.0 pyarrow-hotfix==0.6 pycparser==2.22 pycryptodome==3.20.0 pydantic==2.7.1 pydantic_core==2.18.2 pydeck==0.9.1 pydub==0.25.1 pyeclib==1.6.1 Pygments==2.18.0 Pympler==1.0.1 pynvml==11.5.0 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-json-logger==2.0.7 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 pyzmq==26.0.3 qtconsole==5.5.2 QtPy==2.4.1 quantile-python==1.1 ray==2.21.0 referencing==0.35.1 regex==2024.4.28 requests==2.31.0 requests-toolbelt==1.0.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rich==13.7.1 rouge==1.0.1 rouge-chinese==1.0.3 rouge-score==0.1.2 rpds-py==0.18.1 ruff==0.4.3 sacrebleu==2.4.2 safetensors==0.4.3 scikit-learn==1.5.0 scipy==1.13.0 seaborn==0.13.2 semantic-version==2.10.0 Send2Trash==1.8.3 sentencepiece==0.2.0 sgmllib3k==1.0.0 shellingham==1.5.4 shtab==1.7.1 simple-ddl-parser==1.5.1 simplejson==3.19.2 six==1.16.0 smmap==5.0.1 sniffio==1.3.1 sortedcontainers==2.4.0 soupsieve==2.5 stack-data==0.6.3 starlette==0.37.2 streamlit==1.35.0 swift==2.33.0 sympy==1.12 tabulate==0.9.0 tenacity==8.3.0 tensorboard==2.16.2 tensorboard-data-server==0.7.2 termcolor==2.4.0 terminado==0.18.1 threadpoolctl==3.5.0 tiktoken==0.6.0 tinycss2==1.3.0 tokenizers==0.19.1 toml==0.10.2 tomli==2.0.1 tomlkit==0.12.0 toolz==0.12.1 torch==2.3.0 torchvision==0.18.1 tornado==6.4.1 tqdm==4.66.4 traitlets==5.14.3 transformers==4.40.1 transformers-stream-generator==0.0.5 triton==2.3.0 trl==0.8.6 typer==0.12.3 types-python-dateutil==2.9.0.20240316 typing_extensions==4.11.0 tyro==0.8.3 tzdata==2024.1 ujson==5.9.0 uri-template==1.3.0 urllib3==2.2.1 uvicorn==0.29.0 uvloop==0.19.0 vllm==0.4.3 vllm-flash-attn==2.5.8.post2 vllm_nccl_cu12==2.18.1.0.4.0 watchdog==4.0.1 watchfiles==0.21.0 wcwidth==0.2.13 webcolors==24.6.0 webencodings==0.5.1 websocket-client==1.8.0 websockets==11.0.3 Werkzeug==3.0.3 widgetsnbextension==4.0.11 xattr==1.1.0 xformers==0.0.26.post1 xtuner==0.1.11 xxhash==3.4.1 yapf==0.40.2 yarl==1.9.4 zipp==3.18.1 zstandard==0.22.0

Additional context Add any other context about the problem here(在这里补充其他信息)

edgeinfinity-wzt commented 5 months ago

和https://github.com/modelscope/swift/issues/1111一样, 我在两台机器上均复现了相同问题. 我已经尝试过降低quant_n_samples和quant_seqlen, 在问题发生时没有出现显存不足的报错.

lxb0425 commented 5 months ago

我也遇到这个问题了无故中断，应该怎么处理啊

edgeinfinity-wzt commented 5 months ago

我也遇到这个问题了无故中断，应该怎么处理啊

我仍然没有等到任何回复，也找不到可以参考的解决办法。目前来看qwen2-72b-instruct的两个量化方案都有问题，不知道官方什么时候解决了

tastelikefeet commented 2 months ago

参考：https://swift.readthedocs.io/zh-cn/latest/LLM/LLM%E9%87%8F%E5%8C%96%E4%B8%8E%E5%AF%BC%E5%87%BA%E6%96%87%E6%A1%A3.html

OMP_NUM_THREADS=14

modelscope / ms-swift

Quantization stops for no reason when GPTQ quantizing Qwen2-72b-Instruct #1121