xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.69k stars 369 forks source link

BUG: probability tensor contains either `inf`, `nan` or element < 0 #733

Closed codingl2k1 closed 3 weeks ago

codingl2k1 commented 9 months ago

Describe the bug

A clear and concise description of what the bug is.

torch: 2.2.0 dev model: llama-2-chat 13b none platform: linux max_tokens: 4096

Traceback (most recent call last):
  File "/home/codingl2k1/inference/xinference/api/restful_api.py", line 822, in stream_results
    async for item in iterator:
  File "/home/codingl2k1/inference/xinference/core/model.py", line 105, in __anext__
    return await self._model_actor_ref.next(self._uid)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/pool.py", line 657, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
    return await coro
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/api.py", line 306, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/codingl2k1/inference/xinference/core/utils.py", line 33, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/codingl2k1/inference/xinference/core/model.py", line 373, in next
    r = await self._call_wrapper(_wrapper)
  File "/home/codingl2k1/inference/xinference/core/model.py", line 200, in _call_wrapper
    return await asyncio.to_thread(_wrapper)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/codingl2k1/inference/xinference/core/model.py", line 362, in _wrapper
    return next(gen)
  File "/home/codingl2k1/inference/xinference/model/llm/utils.py", line 256, in _to_chat_completion_chunks
    for i, chunk in enumerate(chunks):
  File "/home/codingl2k1/inference/xinference/model/llm/pytorch/core.py", line 270, in generator_wrapper
    for completion_chunk, _ in generate_stream(
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
    response = gen.send(request)
  File "/home/codingl2k1/inference/xinference/model/llm/pytorch/utils.py", line 203, in generate_stream
    indices = torch.multinomial(probs, num_samples=2)
RuntimeError: [address=0.0.0.0:44941, pid=510473] probability tensor contains either `inf`, `nan` or element < 0

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version. 3.9.18
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

codingl2k1 commented 9 months ago

https://github.com/facebookresearch/llama/issues/380

SwarmKit commented 7 months ago

加載 deepseek-coder-33b-instruct 模型也會出現此錯誤,能否可以自己加選項 model.bfloat16()

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been inactive for 5 days since being marked as stale.