wenda-LLM / wenda

闻达:一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
GNU Affero General Public License v3.0
6.22k stars 809 forks source link

千问int4量化模型输出最后报错 #504

Closed zhuang-maowei closed 9 months ago

zhuang-maowei commented 9 months ago

Describe the bug

在使用千问的int4量化模型是,闻达对话框可正常输出结果,但在输出结束时会报错,对话框由显示文字转变为错误信息 后台错误如下:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "E:\miniconda3\envs\wenda\Lib\site-packages\uvicorn\protocols\websockets\websockets_impl.py", line 254, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\miniconda3\envs\wenda\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\miniconda3\envs\wenda\Lib\site-packages\fastapi\applications.py", line 284, in __call__
    await super().__call__(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\middleware\errors.py", line 149, in __call__
    await self.app(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\middleware\base.py", line 26, in __call__
    await self.app(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
    raise exc
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 20, in __call__
    raise e
  File "E:\miniconda3\envs\wenda\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\routing.py", line 341, in handle
    await self.app(scope, receive, send)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\starlette\routing.py", line 82, in app
    await func(session)
  File "E:\miniconda3\envs\wenda\Lib\site-packages\fastapi\routing.py", line 292, in app
    await dependant.call(**values)
  File "E:\wenda\wenda\wenda.py", line 388, in websocket_endpoint
    raise e
  File "E:\wenda\wenda\wenda.py", line 377, in websocket_endpoint
    for response in LLM.chat_one(prompt, history_formatted, max_length, top_p, temperature, data):
  File "E:\wenda\wenda\llms\llm_qwen.py", line 48, in chat_one
    for response in model.chat_stream(tokenizer, prompt, history=history):
  File "C:\Users\Administrator/.cache\huggingface\modules\transformers_modules\Qwen-14B-Chat-Int4\modeling_qwen.py", line 1292, in stream_generator
    for token in self.generate_stream(
  File "E:\miniconda3\envs\wenda\Lib\site-packages\torch\utils\_contextlib.py", line 56, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "E:\miniconda3\envs\wenda\Lib\site-packages\transformers_stream_generator\main.py", line 969, in sample_stream
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Screenshots image

Desktop (please complete the following information):

Additional context 以下内容在非量化模型Qwen-14B-Chat一切正常

l15y commented 9 months ago

量化后不收敛了,模型问题