运行时显存占用过大和没有获取json返回体

xiaoToby commented 4 months ago

提交前必须检查以下项目

[X] 请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案。
[X] 第三方插件问题：例如llama.cpp、LangChain、text-generation-webui等，同时建议到对应的项目中查找解决方案。

问题类型

效果问题

基础模型

Chinese-Alpaca-2 (7B/13B)

操作系统

Linux

详细描述问题

本地部署了chinese-alpaca-2-7b模型之后，测试使用scripts/openai_server_demo/openai_api_server.py 并用一下指令测试： curl http://localhost:19327/v1/chat/completions \

-H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user","content": "给我讲一些有关杭州的故事吧"} ], "repetition_penalty": 1.0 }

首先使用了GPU,发现显存占用过高，报错
使用--only_gpu, 没有得到期望的回答

问题： 1.关于占用gpu显存过高问题，有没有优化的方法 2.如何能得到期待的问答式回复

依赖情况（代码类问题务必提供）

# 请在此处粘贴依赖情况（请粘贴在本代码块里）

运行日志或截图

iMountTai commented 4 months ago

尝试使用4bit/8bit加载模型推理；使用flash-attn2或sdpa加载推理；设置gpus为机器上的所有卡。
不清楚你是否正确使用了模型与模板，建议贴出详细的运行命令。

xiaoToby commented 4 months ago

尝试使用4bit/8bit加载模型推理；使用flash-attn2或sdpa加载推理；设置gpus为机器上的所有卡。

不清楚你是否正确使用了模型与模板，建议贴出详细的运行命令。

@iMountTai

xiaoToby commented 4 months ago

我想问一下，这些模型运行的显存要求是什么，我现在用的是一张12g的gpu，似乎是不够的

iMountTai commented 4 months ago

&B模型本身权重大小就有14G左右，12g的gpu肯定是不够的，而且cpu推理太慢，建议使用llama.cpp体验。

xiaoToby commented 4 months ago

我增加了gpu，现在能用gpu运行模型，但还是没有得到json返回体

xiaoToby commented 4 months ago

2024-02-21 05:54:04,403 - ERROR - Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 74, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 299, in app raise e File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 294, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(*values) File "/home/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 354, in create_completion output = predict( File "/home/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 206, in predict generation_output = model.generate( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1789, in generate return self.beam_sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3501, in beam_sample if beam_scorer.is_done or stopping_criteria(input_ids, scores): RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

xiaoToby commented 4 months ago

@iMountTai

iMountTai commented 4 months ago

试一下：

curl http://localhost:19327/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "给我讲一些有关杭州的故事吧"
}'

xiaoToby commented 4 months ago

curl http://localhost:19327/v1/completions \ -H "Content-Type: application/json" \ -d '{ "prompt": "给我讲一些有关杭州的故事吧" }'

报错信息和之前是一样的

iMountTai commented 4 months ago

贴一下现在的运行命令，指启动服务的命令

xiaoToby commented 4 months ago

贴一下现在的运行命令，指启动服务的命令

xiaoToby commented 4 months ago

python scripts/openai_server_demo/openai_api_server.py --base_model models/ --gpus 0

这样就可以了

iMountTai commented 4 months ago

好的，确实一张卡就足够使用了。我这边测试了三张卡正常运行，可能具体环境存在差异。

xiaoToby commented 4 months ago

为什么使用多张gpu,就会出现如下问题： RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. @iMountTai

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 3 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

Rick-24 commented 3 months ago

您好，您的来信我已收到，我会尽快处理。祝好！

ymcui / Chinese-LLaMA-Alpaca-2