ztxz16 / fastllm

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Apache License 2.0
3.28k stars 332 forks source link

当输出数据特别长的时候报错。 #409

Closed aofengdaxia closed 7 months ago

aofengdaxia commented 7 months ago

出错的情况

# 当调用以下代码生成较长的内容的时候报错,特别是比如用Python写一个贪吃蛇,就会报错。
stream_generate = model.stream_response_raw(input_tokens=inputs["input_ids"][0], **gen_kwargs, stop_token_ids=stop_token_ids)

错误如下:

+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/sse_starlette/sse.py", line 258, in wrap
    |     await func()
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/sse_starlette/sse.py", line 245, in stream_response
    |     async for data in self.body_iterator:
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/starlette/concurrency.py", line 63, in iterate_in_threadpool
    |     yield await anyio.to_thread.run_sync(_next, as_iterator)
    |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    |     return await future
    |            ^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    |     result = context.run(func, *args)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/starlette/concurrency.py", line 52, in _next
    |     return next(iterator)
    |            ^^^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/api_server.py", line 224, in predict_stream
    |     for new_response in generate_stream_chatglm3(model, tokenizer, gen_params):
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
    |     response = gen.send(request)
    |                ^^^^^^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/utils.py", line 89, in generate_stream_chatglm3
    |     for ret in stream_generate:
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/fastllm_pytools-0.0.1-py3.11.egg/fastllm_pytools/llm.py", line 313, in stream_response_raw
    |     cur_bytes = self.tokenizer_decode_token(cur_token)
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/zhangshiyu/Documents/项目/call/fllm-openai/venv/lib/python3.11/site-packages/fastllm_pytools-0.0.1-py3.11.egg/fastllm_pytools/llm.py", line 185, in tokenizer_decode_token
    |     if self.thread_local_obj.tokenizer_decode_token__output_buffer is None:
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | AttributeError: '_thread._local' object has no attribute 'tokenizer_decode_token__output_buffer'
aofengdaxia commented 7 months ago

把llm.py第185行进行以下修改后测试暂时正常。 修改内容如下

if "tokenizer_decode_token__output_buffer" not in dir(self.thread_local_obj) or self.thread_local_obj.tokenizer_decode_token__output_buffer is None:
            self.thread_local_obj.tokenizer_decode_token__output_buffer = ctypes.create_string_buffer(output_buffer_init_len)
aofengdaxia commented 7 months ago

已经通过递交PR解决