Open pandaTED opened 1 week ago
此外,如果并发请求 ip:port/v1 接口,会报错:
Error code: 500 - {'detail': '[address=0.0.0.0:35395, pid=157] probability tensor contains either inf
, nan
or element < 0'}
Traceback:
Traceback (most recent call last):
File "D:\GLM-4-main\function_calling_demo\src\langchainClient_duojincheng.py", line 552, in process_string
result = llm_with_tools.invoke(wenti2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_core\runnables\base.py", line 5092, in invoke
return self.bound.invoke(
^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_core\language_models\chat_models.py", line 276, in invoke
self.generate_prompt(
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_core\language_models\chat_models.py", line 776, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_core\language_models\chat_models.py", line 633, in generate
raise e
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_core\language_models\chat_models.py", line 623, in generate
self._generate_with_cache(
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_core\language_models\chat_models.py", line 845, in _generate_with_cache
result = self._generate(
^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\langchain_openai\chat_models\base.py", line 649, in _generate
response = self.client.create(payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_utils_utils.py", line 274, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai\resources\chat\completions.py", line 668, in create
return self._post(
^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 937, in request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 1075, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 1075, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\panda\miniconda3\envs\chatchat\Lib\site-packages\openai_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': '[address=0.0.0.0:35395, pid=157] probability tensor contains either inf
, nan
or element < 0'}
以下是并发请求 xinference 后端时,xinference的报错:
xinference-1 | 2024-09-09 01:41:20,027 xinference.api.restful_api 1 ERROR [address=0.0.0.0:35395, pid=157] CUDA error: device-side assert triggered
xinference-1 | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
xinference-1 | Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
xinference-1 | Traceback (most recent call last):
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1720, in create_chat_completion
xinference-1 | data = await model.chat(
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
xinference-1 | return self._process_result_message(result)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference-1 | raise message.as_instanceof_cause()
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send
xinference-1 | result = await self._run_coro(message.message_id, coro)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro
xinference-1 | return await coro
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive
xinference-1 | return await super().on_receive(message) # type: ignore
xinference-1 | File "xoscar/core.pyx", line 558, in on_receive__
xinference-1 | raise ex
xinference-1 | File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
xinference-1 | async with self._lock:
xinference-1 | File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
xinference-1 | with debug_async_timeout('actor_lock_timeout',
xinference-1 | File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive
xinference-1 | result = await result
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func
xinference-1 | ret = await fn(self, *args, kwargs)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper
xinference-1 | r = await func(self, *args, *kwargs)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped
xinference-1 | ret = await func(args, kwargs)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat
xinference-1 | response = await self._call_wrapper_json(
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json
xinference-1 | return await self._call_wrapper("json", fn, *args, kwargs)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper
xinference-1 | return await fn(*args, *kwargs)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper
xinference-1 | ret = await asyncio.to_thread(fn, args, kwargs)
xinference-1 | File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference-1 | return await loop.run_in_executor(None, func_call)
xinference-1 | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
xinference-1 | result = self.fn(*self.args, **self.kwargs)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 365, in chat
xinference-1 | inputs = inputs.to(self._model.device)
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 803, in to
xinference-1 | self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None}
xinference-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 803, in TORCH_USE_CUDA_DSA
to enable device-side assertions.
This issue is stale because it has been open for 7 days with no activity.
System Info / 系統信息
xinference:0.15 langchain 0.2.14 langchain-core 0.2.35 langchain-experimental 0.0.58 langchain-openai 0.1.22
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xinference:0.15 langchain 0.2.14 langchain-core 0.2.35 langchain-experimental 0.0.58 langchain-openai 0.1.22
The command used to start Xinference / 用以启动 xinference 的命令
docker-compose文件:
services: xinference: image: xprobe/xinference:v0.15.0 restart: always command: xinference-local -H 0.0.0.0 ports: # 不使用 host network 时可打开.
network_mode: "host"
将本地路径(~/xinference)挂载到容器路径(/root/.xinference)中,
详情见: https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html
volumes:
- ~/xinference/cache/huggingface:/root/.cache/huggingface
- ~/xinference/cache/modelscope:/root/.cache/modelscope
deploy: resources: reservations: devices:
模型源更改为 ModelScope, 默认为 HuggingFace
environment:
Reproduction / 复现过程
Expected behavior / 期待表现
其他如 xinference 0.14 版时正常输出。