ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7k stars 570 forks source link

运行时显存占用过大和没有获取json返回体 #525

Closed xiaoToby closed 3 months ago

xiaoToby commented 4 months ago





Chinese-Alpaca-2 (7B/13B)




本地部署了chinese-alpaca-2-7b模型之后,测试使用scripts/openai_server_demo/openai_api_server.py 并用一下指令测试: curl http://localhost:19327/v1/chat/completions \

-H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user","content": "给我讲一些有关杭州的故事吧"} ], "repetition_penalty": 1.0 }

  1. 首先使用了GPU,发现显存占用过高,报错
  2. 使用--only_gpu, 没有得到期望的回答

问题: 1.关于占用gpu显存过高问题,有没有优化的方法 2.如何能得到期待的问答式回复


# 请在此处粘贴依赖情况(请粘贴在本代码块里)



iMountTai commented 4 months ago
  1. 尝试使用4bit/8bit加载模型推理;使用flash-attn2或sdpa加载推理;设置gpus为机器上的所有卡。
  2. 不清楚你是否正确使用了模型与模板,建议贴出详细的运行命令。
xiaoToby commented 4 months ago
  1. 尝试使用4bit/8bit加载模型推理;使用flash-attn2或sdpa加载推理;设置gpus为机器上的所有卡。
  2. 不清楚你是否正确使用了模型与模板,建议贴出详细的运行命令。

image @iMountTai

xiaoToby commented 4 months ago


iMountTai commented 4 months ago


xiaoToby commented 4 months ago


xiaoToby commented 4 months ago

2024-02-21 05:54:04,403 - ERROR - Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 74, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 299, in app raise e File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 294, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(*values) File "/home/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 354, in create_completion output = predict( File "/home/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 206, in predict generation_output = model.generate( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1789, in generate return self.beam_sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3501, in beam_sample if beam_scorer.is_done or stopping_criteria(input_ids, scores): RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

xiaoToby commented 4 months ago


iMountTai commented 4 months ago


curl http://localhost:19327/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "给我讲一些有关杭州的故事吧"
xiaoToby commented 4 months ago

curl http://localhost:19327/v1/completions \ -H "Content-Type: application/json" \ -d '{ "prompt": "给我讲一些有关杭州的故事吧" }'

image 报错信息和之前是一样的

iMountTai commented 4 months ago


xiaoToby commented 4 months ago


image image

xiaoToby commented 4 months ago

python scripts/openai_server_demo/openai_api_server.py --base_model models/ --gpus 0


iMountTai commented 4 months ago


xiaoToby commented 4 months ago

为什么使用多张gpu,就会出现如下问题: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. @iMountTai

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 3 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

Rick-24 commented 3 months ago

您好,您的来信我已收到,我会尽快处理。      祝好!