xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
3.74k stars 316 forks source link

cogvlm2,internvl-chat error #1619

Open Savid-mask777 opened 1 month ago

Savid-mask777 commented 1 month ago

Describe the bug

xinfernce从魔搭源下载的cogvlm2报错。internvlVL也报错,但直接使用transromers代码可以运行

To Reproduce

  1. Python 3.10.14.
  2. Tesla V100-SXM2-32GB*8 CUDA12.2
  3. gcc version 9.3.1
  4. centos 7.9.2009 (Core)

    File "/root/Download/conda/envs/gpt/bin/xinference", line 8, in sys.exit(cli()) File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 1069, in model_generate loop.run_until_complete(task) File "/root/Download/conda/envs/gpt/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 1046, in generate_internal for chunk in model.generate( File "/root/Download/conda/envs/gpt/lib/python3.10/site-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:33865, pid=87050] Command '['/usr/bin/gcc', '/tmp/tmp4t3k0di3/main.c', '-O3', '-I/root/Download/conda/envs/gpt/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/root/Download/conda/envs/gpt/include/python3.10', '-I/tmp/tmp4t3k0di3', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp4t3k0di3/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-L/lib64', '-L/lib', '-L/lib64', '-L/lib']' returned non-zero exit status 1.

Expected behavior

xinfernce从魔搭源下载的cogvlm2报错。interVLM2也报错,但本地transromers代码可以运行

qinxuye commented 3 weeks ago

看上去是环境问题,还没有跑到 xinf 的模型推理部分。