BUG: probability tensor contains either `inf`, `nan` or element < 0

codingl2k1 commented 9 months ago

Describe the bug

A clear and concise description of what the bug is.

torch: 2.2.0 dev model: llama-2-chat 13b none platform: linux max_tokens: 4096

Traceback (most recent call last):
  File "/home/codingl2k1/inference/xinference/api/restful_api.py", line 822, in stream_results
    async for item in iterator:
  File "/home/codingl2k1/inference/xinference/core/model.py", line 105, in __anext__
    return await self._model_actor_ref.next(self._uid)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/pool.py", line 657, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
    return await coro
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/xoscar/api.py", line 306, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/codingl2k1/inference/xinference/core/utils.py", line 33, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/codingl2k1/inference/xinference/core/model.py", line 373, in next
    r = await self._call_wrapper(_wrapper)
  File "/home/codingl2k1/inference/xinference/core/model.py", line 200, in _call_wrapper
    return await asyncio.to_thread(_wrapper)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/codingl2k1/inference/xinference/core/model.py", line 362, in _wrapper
    return next(gen)
  File "/home/codingl2k1/inference/xinference/model/llm/utils.py", line 256, in _to_chat_completion_chunks
    for i, chunk in enumerate(chunks):
  File "/home/codingl2k1/inference/xinference/model/llm/pytorch/core.py", line 270, in generator_wrapper
    for completion_chunk, _ in generate_stream(
  File "/home/codingl2k1/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
    response = gen.send(request)
  File "/home/codingl2k1/inference/xinference/model/llm/pytorch/utils.py", line 203, in generate_stream
    indices = torch.multinomial(probs, num_samples=2)
RuntimeError: [address=0.0.0.0:44941, pid=510473] probability tensor contains either `inf`, `nan` or element < 0

To Reproduce

To help us to reproduce this bug, please provide information below:

Your Python version. 3.9.18
The version of xinference you use.
Versions of crucial packages.
Full stack of the error.
Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

codingl2k1 commented 9 months ago

https://github.com/facebookresearch/llama/issues/380

SwarmKit commented 7 months ago

加載 deepseek-coder-33b-instruct 模型也會出現此錯誤，能否可以自己加選項 model.bfloat16()

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been inactive for 5 days since being marked as stale.

xorbitsai / inference