xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
3.89k stars 324 forks source link

QUESTION #1016

Open 1102asd opened 5 months ago

1102asd commented 5 months ago

自己下载了一个baichuan-7b的模型: 2024-02-20 06:17:56,078 xinference.model.llm.llm_family 941596 INFO Caching from URI: /home/clouditera/jupyter/hsc_test/test/inference/models/baichuan-7b /home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.18s/it] Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 67.51it/s] Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 69.98it/s] Traceback (most recent call last): File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/route_utils.py", line 233, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1608, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1188, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 639, in asyncgen_wrapper response = await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/chat_interface.py", line 487, in _stream_fn first_response = await async_iteration(generator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 506, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 489, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/chat_interface.py", line 123, in generate_wrapper for chunk in model.chat( ^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/client/restful/restful_client.py", line 392, in chat raise RuntimeError( RuntimeError: Failed to generate chat completion, detail: 500 Server Error: Internal Server Error for url: http://localhost:9997/v1/chat/completions 应该怎么解决?

1102asd commented 5 months ago

cuda 12.0 image

1102asd commented 5 months ago

image

wolfibmmbi commented 5 months ago

同样的问题,求解决 环境:python3.9-python3.11,cuda12.1都试过了。都一样

ChengjieLi28 commented 5 months ago

@wolfibmmbi @1102asd 升级0.9.0,应该已经修复此问题