ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7k stars 570 forks source link

运行时显存占用过大和没有获取json返回体 #525

Closed xiaoToby closed 3 months ago

xiaoToby commented 4 months ago

提交前必须检查以下项目

问题类型

效果问题

基础模型

Chinese-Alpaca-2 (7B/13B)

操作系统

Linux

详细描述问题

本地部署了chinese-alpaca-2-7b模型之后,测试使用scripts/openai_server_demo/openai_api_server.py 并用一下指令测试: curl http://localhost:19327/v1/chat/completions \

-H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user","content": "给我讲一些有关杭州的故事吧"} ], "repetition_penalty": 1.0 }

  1. 首先使用了GPU,发现显存占用过高,报错
  2. 使用--only_gpu, 没有得到期望的回答

问题: 1.关于占用gpu显存过高问题,有没有优化的方法 2.如何能得到期待的问答式回复

依赖情况(代码类问题务必提供)

# 请在此处粘贴依赖情况(请粘贴在本代码块里)

运行日志或截图

image

iMountTai commented 4 months ago
  1. 尝试使用4bit/8bit加载模型推理;使用flash-attn2或sdpa加载推理;设置gpus为机器上的所有卡。
  2. 不清楚你是否正确使用了模型与模板,建议贴出详细的运行命令。
xiaoToby commented 4 months ago
  1. 尝试使用4bit/8bit加载模型推理;使用flash-attn2或sdpa加载推理;设置gpus为机器上的所有卡。
  2. 不清楚你是否正确使用了模型与模板,建议贴出详细的运行命令。

image @iMountTai

xiaoToby commented 4 months ago

我想问一下,这些模型运行的显存要求是什么,我现在用的是一张12g的gpu,似乎是不够的

iMountTai commented 4 months ago

&B模型本身权重大小就有14G左右,12g的gpu肯定是不够的,而且cpu推理太慢,建议使用llama.cpp体验。

xiaoToby commented 4 months ago

我增加了gpu,现在能用gpu运行模型,但还是没有得到json返回体

xiaoToby commented 4 months ago

2024-02-21 05:54:04,403 - ERROR - Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 74, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 299, in app raise e File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 294, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(*values) File "/home/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 354, in create_completion output = predict( File "/home/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 206, in predict generation_output = model.generate( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1789, in generate return self.beam_sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3501, in beam_sample if beam_scorer.is_done or stopping_criteria(input_ids, scores): RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

xiaoToby commented 4 months ago

@iMountTai

iMountTai commented 4 months ago

试一下:

curl http://localhost:19327/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "给我讲一些有关杭州的故事吧"
}'
xiaoToby commented 4 months ago

curl http://localhost:19327/v1/completions \ -H "Content-Type: application/json" \ -d '{ "prompt": "给我讲一些有关杭州的故事吧" }'

image 报错信息和之前是一样的

iMountTai commented 4 months ago

贴一下现在的运行命令,指启动服务的命令

xiaoToby commented 4 months ago

贴一下现在的运行命令,指启动服务的命令

image image

xiaoToby commented 4 months ago

python scripts/openai_server_demo/openai_api_server.py --base_model models/ --gpus 0

这样就可以了

iMountTai commented 4 months ago

好的,确实一张卡就足够使用了。我这边测试了三张卡正常运行,可能具体环境存在差异。

xiaoToby commented 4 months ago

为什么使用多张gpu,就会出现如下问题: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. @iMountTai

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 3 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

Rick-24 commented 3 months ago

您好,您的来信我已收到,我会尽快处理。      祝好!