xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
2.34k stars 269 forks source link

4*4090 显卡部署glm4-9b 使用dify 的api调用报错 #315

Open he498 opened 1 week ago

he498 commented 1 week ago

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system


详细描述问题 | Detailed description of the problem

我的硬件环境是 4 * 4090 ,cuda 12.1 在使用dify的外部api接口的时候报错。使用的dify的这个流式接口:/v1/chat-messages。 用本项目代码中的transformer的方式部署。 image


Package Version

accelerate 0.31.0 aiohttp 3.9.5 aiosignal 1.3.1 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 attrs 23.2.0 backoff 2.2.1 beautifulsoup4 4.12.3 bitsandbytes 0.42.0 certifi 2024.6.2 cffi 1.16.0 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 coloredlogs 15.0.1 contourpy 1.2.1 cpm-kernels 1.0.11 cryptography 42.0.8 cycler 0.12.1 dataclasses-json 0.6.7 dataclasses-json-speakeasy 0.5.11 Deprecated 1.2.14 distro 1.9.0 dnspython 2.6.1 effdet 0.4.1 einops 0.8.0 email_validator 2.1.2 emoji 2.12.1 et-xmlfile 1.1.0 fastapi 0.111.0 fastapi-cli 0.0.4 filelock 3.15.1 filetype 1.2.0 flatbuffers 24.3.25 fonttools 4.53.0 frozenlist 1.4.1 fsspec 2024.6.0 greenlet 3.0.3 h11 0.14.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.4 humanfriendly 10.0 idna 3.7 iopath 0.1.10 Jinja2 3.1.4 joblib 1.4.2 jsonpatch 1.33 jsonpath-python 1.0.6 jsonpointer 3.0.0 kiwisolver 1.4.5 langchain 0.2.5 langchain-community 0.2.5 langchain-core 0.2.8 langchain-text-splitters 0.2.1 langdetect 1.0.9 langsmith 0.1.78 layoutparser 0.3.4 loguru 0.7.2 lxml 5.2.2 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 marshmallow 3.21.3 matplotlib 3.9.0 mdurl 0.1.2 mpmath 1.3.0 msg-parser 1.2.0 multidict 6.0.5 mypy-extensions 1.0.0 networkx 3.3 nltk 3.8.1 numpy 1.26.4 nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-curand-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.40 nvidia-nvtx-cu12 12.1.105 olefile 0.47 omegaconf 2.3.0 onnx 1.16.1 onnxruntime 1.15.1 openai 1.34.0 opencv-python openparse 0.5.7 openpyxl 3.1.4 orjson 3.10.5 packaging 24.1 pandas 2.2.2 pdf2image 1.17.0 pdfminer.six 20231228 pdfplumber 0.11.1 peft 0.11.1 pikepdf 9.0.0 pillow 10.3.0 pillow_heif 0.16.0 pip 24.0 portalocker 2.8.2 protobuf 5.27.1 psutil 5.9.8 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pydantic 2.7.4 pydantic_core 2.18.4 Pygments 2.18.0 PyMuPDF 1.24.5 PyMuPDFb 1.24.3 pypandoc 1.13 pyparsing 3.1.2 pypdf 4.2.0 pypdfium2 4.30.0 pytesseract 0.3.10 python-dateutil 2.9.0.post0 python-docx 1.1.2 python-dotenv 1.0.0 python-iso639 2024.4.27 python-magic 0.4.27 python-multipart 0.0.9 python-pptx 0.6.23 pytz 2024.1 PyYAML 6.0.1 rapidfuzz 3.9.3 rapidocr-onnxruntime 1.3.22 regex 2024.5.15 requests 2.32.3 rich 13.7.1 safetensors 0.4.3 scikit-learn 1.5.0 scipy 1.13.1 sentence-transformers 3.0.1 sentencepiece 0.2.0 setuptools 69.5.1 shapely 2.0.4 shellingham 1.5.4 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 SQLAlchemy 2.0.30 sse-starlette 2.1.2 starlette 0.37.2 starlette-context 0.3.6 sympy 1.12.1 tabulate 0.9.0 tenacity 8.4.1 threadpoolctl 3.5.0 tiktoken 0.7.0 timm 1.0.3 tokenizers 0.19.1 torch 2.3.1 torchvision 0.18.1 tqdm 4.66.4 transformers 4.42.4 transformers-stream-generator 0.0.5 triton 2.3.1 typer 0.12.3 typing_extensions 4.12.2 typing-inspect 0.9.0 tzdata 2024.1 ujson 5.10.0 unstructured 0.13.2 unstructured-client 0.18.0 unstructured-inference 0.7.25 unstructured.pytesseract 0.3.12 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 watchfiles 0.22.0 websockets 12.0 wheel 0.43.0 wrapt 1.16.0 xlrd 2.0.1 XlsxWriter 3.2.0 yarl 1.9.4

运行日志或截图 | Runtime logs or screenshots

Exception in thread Thread-4 (generate): Traceback (most recent call last): File "/data/conda/aconda3/envs/glm4/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() File "/data/conda/aconda3/envs/glm4/lib/python3.11/threading.py", line 982, in run self._target(*self._args, *self._kwargs) File "/data/conda/aconda3/envs/glm4/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/data/conda/aconda3/envs/glm4/lib/python3.11/site-packages/transformers/generation/utils.py", line 1914, in generate result = self._sample( ^^^^^^^^^^^^^ File "/data/conda/aconda3/envs/glm4/lib/python3.11/site-packages/transformers/generation/utils.py", line 2693, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: probability tensor contains either inf, nan or element < 0

he498 commented 1 week ago
