QUESTION - Githubissues

1102asd commented 5 months ago

自己下载了一个baichuan-7b的模型： 2024-02-20 06:17:56,078 xinference.model.llm.llm_family 941596 INFO Caching from URI: /home/clouditera/jupyter/hsc_test/test/inference/models/baichuan-7b /home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.18s/it] Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 67.51it/s] Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 69.98it/s] Traceback (most recent call last): File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/route_utils.py", line 233, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1608, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1188, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 639, in asyncgen_wrapper response = await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/chat_interface.py", line 487, in _stream_fn first_response = await async_iteration(generator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 506, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 489, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/chat_interface.py", line 123, in generate_wrapper for chunk in model.chat( ^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/client/restful/restful_client.py", line 392, in chat raise RuntimeError( RuntimeError: Failed to generate chat completion, detail: 500 Server Error: Internal Server Error for url: http://localhost:9997/v1/chat/completions 应该怎么解决？

1102asd commented 5 months ago

cuda 12.0

1102asd commented 5 months ago

wolfibmmbi commented 5 months ago

同样的问题，求解决环境：python3.9-python3.11，cuda12.1都试过了。都一样

ChengjieLi28 commented 5 months ago

@wolfibmmbi @1102asd 升级0.9.0，应该已经修复此问题

xorbitsai / inference

QUESTION #1016