Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
自己下载了一个baichuan-7b的模型:
2024-02-20 06:17:56,078 xinference.model.llm.llm_family 941596 INFO Caching from URI: /home/clouditera/jupyter/hsc_test/test/inference/models/baichuan-7b
/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.18s/it]
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 67.51it/s]
Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 69.98it/s]
Traceback (most recent call last):
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/route_utils.py", line 233, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1608, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1188, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration
return await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 639, in asyncgen_wrapper
response = await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/chat_interface.py", line 487, in _stream_fn
first_response = await async_iteration(generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration
return await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 506, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 489, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/chat_interface.py", line 123, in generate_wrapper
for chunk in model.chat(
^^^^^^^^^^^
File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/client/restful/restful_client.py", line 392, in chat
raise RuntimeError(
RuntimeError: Failed to generate chat completion, detail: 500 Server Error: Internal Server Error for url: http://localhost:9997/v1/chat/completions
应该怎么解决?
自己下载了一个baichuan-7b的模型: 2024-02-20 06:17:56,078 xinference.model.llm.llm_family 941596 INFO Caching from URI: /home/clouditera/jupyter/hsc_test/test/inference/models/baichuan-7b /home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.18s/it] Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 67.51it/s] Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 69.98it/s] Traceback (most recent call last): File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/route_utils.py", line 233, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1608, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/blocks.py", line 1188, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 639, in asyncgen_wrapper response = await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/chat_interface.py", line 487, in _stream_fn first_response = await async_iteration(generator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 513, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 506, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/gradio/utils.py", line 489, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/chat_interface.py", line 123, in generate_wrapper for chunk in model.chat( ^^^^^^^^^^^ File "/home/clouditera/miniconda3/envs/xinference/lib/python3.11/site-packages/xinference/client/restful/restful_client.py", line 392, in chat raise RuntimeError( RuntimeError: Failed to generate chat completion, detail: 500 Server Error: Internal Server Error for url: http://localhost:9997/v1/chat/completions 应该怎么解决?