qyd-gc commented 4 months ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

镜像是根据docker-npu构建的 docker-compose文件：

services:
  llamafactory:
    #build:
    #  dockerfile: ./docker/docker-npu/Dockerfile
    #  context: ../..
    #  args:
    #    INSTALL_DEEPSPEED: false
    #    PIP_INDEX: https://pypi.org/simple
    image: llama:1.0
    container_name: llamafactory
    volumes:
      - ../../hf_cache:/root/.cache/huggingface
      - ../../ms_cache:/root/.cache/modelscope
      - ../../data:/app/data
      - ../../output:/app/output
      - /usr/local/dcmi:/usr/local/dcmi
      - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
      - /usr/local/Ascend/driver:/usr/local/Ascend/driver
      - /etc/ascend_install.info:/etc/ascend_install.info
      - ./model:/home/mind/model
      -./llama.yaml:/home/mind/llama.yaml
    ports:
      - "7860:7860"
      - "8000:8000"
    ipc: host
    tty: true
    stdin_open: true
    command: bash
    devices:
      - /dev/davinci0
      - /dev/davinci_manager
      - /dev/devmm_svm
      - /dev/hisi_hdc
    restart: unless-stopped

llama.yaml配置文件：

model_name_or_path: /home/mind/model/Qwen1.5-0.5B
template: qwen
do_sample: false

Reproduction

启动指令：llamafactory-cli api llama.yaml 启动后可以正常启动api服务，但是调用接口报错， curl： curl --request POST \ --url http://localhost:8000/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ //"model": "/home/mind/model/Qwen1.5-0.5B", "model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Who won the world series in 2020?" } ] }' 报错信息： `[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) /usr/local/lib/python3.10/dist-packages/transformers/generation/logits_process.py:1601: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) E39999: Inner Error! E39999: 2024-07-03-08:32:53.639.718 An exception occurred during AICPU execution, stream_id:2, task_id:2173, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730] TraceBack (most recent call last): Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776] Aicpu kernel execute failed, device_id=0, stream_id=2, task_id=2173, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579] Aicpu kernel execute failed, device_id=0, stream_id=2, task_id=2173, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512] rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

DEVICE[0] PID[2095]: EXCEPTION TASK: Exception info:TGID=3724449, model id=65535, stream id=2, stream phase=SCHEDULE, task id=2173, task type=aicpu kernel, recently received task id=2173, recently send task id=2172, task phase=RUN Message info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210 Other info[0]:time=2024-07-03-16:32:52.954.172, function=proc_aicpu_task_done, line=970, error code=0x2a INFO: 172.19.0.1:38758 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/app/src/llamafactory/api/app.py", line 99, in create_chat_completion return await create_chat_completion_response(request, chat_model) File "/app/src/llamafactory/api/chat.py", line 148, in create_chat_completion_response responses = await chat_model.achat( File "/app/src/llamafactory/chat/chat_model.py", line 72, in achat return await self.engine.chat(messages, system, tools, image, input_kwargs) File "/app/src/llamafactory/chat/hf_engine.py", line 296, in chat return await loop.run_in_executor(pool, self._chat, input_args) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/app/src/llamafactory/chat/hf_engine.py", line 189, in _chat generate_output = model.generate(gen_kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1914, in generate result = self._sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2711, in _sample unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/stopping_criteria.py", line 508, in call is_done = is_done | criteria(input_ids, scores, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/stopping_criteria.py", line 499, in call is_done = torch.isin(input_ids[:, -1], self.eos_token_id) RuntimeError: ACL stream synchronize failed, error code:507018`

device0 debug日志信息： ERROR] AICPU(32550,aicpu_scheduler):2024-07-03-16:32:52.958.031 [multinomialwithreplacement.cc:95][DoCompute][tid:32558]input must >= 0 [ERROR] CCECPU(32550,aicpu_scheduler):2024-07-03-16:32:52.958.072 [ae_kernel_lib_aicpu.cc:277][TransformKernelErrorCode][tid:32558][AICPU_PROCESSER] call aicpu api RunCpuKernel in libcpu_kernels.so failed, ret:4294967295. [ERROR] CCECPU(32550,aicpu_scheduler):2024-07-03-16:32:52.958.085 [aicpusd_event_process.cpp:1525][ExecuteTsKernelTask][tid:32558] Aicpu engine process failed, result[-1], opName[MultinomialWithReplacement].

Expected behavior

我原本使用chat模式的时候也出现了同样的情况，但是在llama,yaml的配置文件里加上do_sample: false之后chat模式可以使用，但是api模式加上这个配置后依然没有作用

Others

No response

qyd-gc commented 4 months ago

curl是这个： curl --request POST \ --url http://localhost:8000/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "/home/mind/model/Qwen1.5-0.5B", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Who won the world series in 2020?" } ] }'

hiyouga commented 4 months ago

需要在 curl 里面加 do_sample: false

qyd-gc commented 2 months ago

仓库的docker/docker-npu这个路径下有Dockerfile，我是根据这个Dockerfile构建的镜像，里面基础镜像有

FROM cosdt/cann:8.0.rc1-910-ubuntu22.04FROM cosdt/cann:8.0.rc1-910b-ubuntu22.04

FROM cosdt/cann:8.0.rc1-910-openeuler22.03

FROM cosdt/cann:8.0.rc1-910b-openeuler22.03

你根据你的显卡版本选择，它这边默认是用的910b、ubuntu22.04、cann8.0.rc1的环境的版本

------------------ 原始邮件 ------------------ 发件人: "hiyouga/LLaMA-Factory" @.>; 发送时间: 2024年8月14日(星期三) 下午3:15 @.>; @.**@.>; 主题: Re: [hiyouga/LLaMA-Factory] 910A推理qwen1.5chat模式可行，api模式报错aicpu exception (Issue #4666)

我想请问你一下，你的镜像是根据docker-npu构建的，那你的基础镜像是那个呢，谢谢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

zangxyyds commented 2 months ago

仓库的docker/docker-npu这个路径下有Dockerfile，我是根据这个Dockerfile构建的镜像，里面基础镜像有 # FROM cosdt/cann:8.0.rc1-910-ubuntu22.04FROM cosdt/cann:8.0.rc1-910b-ubuntu22.04 # FROM cosdt/cann:8.0.rc1-910-openeuler22.03 # FROM cosdt/cann:8.0.rc1-910b-openeuler22.03 你根据你的显卡版本选择，它这边默认是用的910b、ubuntu22.04、cann8.0.rc1的环境的版本 … ------------------ 原始邮件 ------------------ 发件人: "hiyouga/LLaMA-Factory" @.>; 发送时间: 2024年8月14日(星期三) 下午3:15 @.>; @.**@.>; 主题: Re: [hiyouga/LLaMA-Factory] 910A推理qwen1.5chat模式可行，api模式报错aicpu exception (Issue #4666) 我想请问你一下，你的镜像是根据docker-npu构建的，那你的基础镜像是那个呢，谢谢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

非常感谢你，确实是可以的。但我遇见一个问题是单卡确实可以，但是多卡我添加了

- /dev/davinci1

  # - /dev/davinci2
  # - /dev/davinci3
  # - /dev/davinci4
  # - /dev/davinci5
  # - /dev/davinci6
  # - /dev/davinci7

这样会卡在这个地方多卡部署情况下我不确定是不是需要改docker-compose.yaml中的内容，请问多卡有遇见这样的情况嘛，另外如果可以推理，那你尝试过微调嘛，是否也可以。非常感谢你哟。

hiyouga / LLaMA-Factory

910A推理qwen1.5chat模式可行，api模式报错aicpu exception #4666