[BUG:API] 部署到HF Space 调用接口报错 RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.

steveoon commented 1 month ago

确认清单

[X] 我已经阅读过 README.md 和 dependencies.md 文件
[X] 我已经确认之前没有 issue 或 discussion 涉及此 BUG
[X] 我已经确认问题发生在最新代码或稳定版本中

Forge Commit 或者 Tag

✨ add script.spk.code_to_spk

Python 版本

python:3.10.13

PyTorch 版本

pytorch-lightning==2.4.0/vector-quantize-pytorch==1.16.2

操作系统信息

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

BUG 描述

部署项目到HF的Space后启动了API服务,然后调用时报错:

RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment. 下面有详细错误日志

BUG 端点

/v1/audio/speech

复现参数

curl -X POST https://x-x-hg.hf.space/v1/audio/speech \
-H "Authorization: Bearer token" \
-H "Content-Type: application/json" \
-d '{
  "model": "chat-tts",
  "input": "你好,我是Chat Inspire,一个AI驱动的智能对话助手,有什么可以帮到您的?",
  "voice": "Bob",
  "style": "chat",
  "enhance": true,
  "response_format": "mp3"
}'

期望结果

可在HuggingFace中正常调用接口

原因和解决方案：

Hugging Face Spaces 的特殊环境： Spaces 使用了一种称为"无状态 GPU"的环境，这需要特殊的 CUDA 初始化方式。
代码修改：你需要修改代码，以适应这种环境。主要的修改应该在模型加载和 CUDA 初始化的部分。
使用 spaces.zero 包： Hugging Face 提供了一个特殊的包来处理这种情况。

实际结果

报错: RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
这个错误表明，在 Hugging Face Spaces 的无状态 GPU 环境中，CUDA（NVIDIA 的并行计算平台）不应该在主进程中初始化。

错误信息

2024-09-02 15:20:31,696 - modules.repos_static.ChatTTS.ChatTTS.core - INFO - try to load from local: ./models/ChatTTS
2024-09-02 15:20:31,696 - modules.repos_static.ChatTTS.ChatTTS.core - INFO - checking assets...
2024-09-02 15:20:34,260 - modules.repos_static.ChatTTS.ChatTTS.core - INFO - all assets are already latest.
2024-09-02 15:20:34,528 - modules.devices.devices - ERROR - Error in torch_gc
Traceback (most recent call last):
  File "/home/user/app/modules/devices/devices.py", line 237, in wrapper
    ret = func(*args, **kwargs)
  File "/home/user/app/modules/core/models/zoo/ChatTTS.py", line 60, in load_chat_tts
    do_load_chat_tts()
  File "/home/user/app/modules/core/models/zoo/ChatTTS.py", line 28, in do_load_chat_tts
    chat_tts.load(
  File "/home/user/app/modules/repos_static/ChatTTS/ChatTTS/core.py", line 135, in load
    return self._load(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/modules/repos_static/ChatTTS/ChatTTS/core.py", line 258, in _load
    .to(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
    self._buffers[key] = fn(buf)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 245, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/app/modules/devices/devices.py", line 118, in torch_gc
    torch.cuda.ipc_collect()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 804, in ipc_collect
    _lazy_init()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 245, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init
2024-09-02 15:20:34,961 - modules.api.api_setup - ERROR - Uncaught exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/app/modules/core/pipeline/generate/BatchGenerate.py", line 51, in generate
    self.generate_batch(batch)
  File "/home/user/app/modules/core/pipeline/generate/BatchGenerate.py", line 68, in generate_batch
    results = model.generate_batch(segments=segments, context=self.context)
  File "/home/user/app/modules/core/models/tts/ChatTtsModel.py", line 61, in generate_batch
    return self.generate_batch_base(segments, context, stream=False)
  File "/home/user/app/modules/core/models/tts/ChatTtsModel.py", line 108, in generate_batch_base
    infer = self.get_infer(context)
  File "/home/user/app/modules/core/models/tts/ChatTtsModel.py", line 69, in get_infer
    return ChatTTSInfer(self.load())
  File "/home/user/app/modules/core/models/tts/ChatTtsModel.py", line 41, in load
    self.chat = load_chat_tts()
  File "/home/user/app/modules/devices/devices.py", line 237, in wrapper
    ret = func(*args, **kwargs)
  File "/home/user/app/modules/core/models/zoo/ChatTTS.py", line 60, in load_chat_tts
    do_load_chat_tts()
  File "/home/user/app/modules/core/models/zoo/ChatTTS.py", line 28, in do_load_chat_tts
    chat_tts.load(
  File "/home/user/app/modules/repos_static/ChatTTS/ChatTTS/core.py", line 135, in load
    return self._load(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/modules/repos_static/ChatTTS/ChatTTS/core.py", line 258, in _load
    .to(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
    self._buffers[key] = fn(buf)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 245, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init

steveoon commented 1 month ago

https://github.com/vllm-project/vllm/issues/3510

这个issue可能能起到一些参考作用

steveoon commented 1 month ago

不用Zero部署时没有这个问题. 但如果支持的话, 会非常开心

zhzLuke96 commented 1 month ago

如果需要部署 huggingface zero 建议使用 https://github.com/lenML/ChatTTS-Forge/tags 下的稳定版本最新的开发代码不保证支持 zero 环境，zero 环境部署效果可以参考 readme 中的 hf 在线体验地址

steveoon commented 1 month ago

如果需要部署 huggingface zero 建议使用 https://github.com/lenML/ChatTTS-Forge/tags 下的稳定版本最新的开发代码不保证支持 zero 环境，zero 环境部署效果可以参考 readme 中的 hf 在线体验地址

@zhzLuke96 用0.7.0版本gradio方式部署在zero, 还是会报同样问题: 不能在主进程中初始化CUDA, 这个问题发生在API调用时,如果在webui直接使用似乎没发现这个问题

zhzLuke96 commented 1 month ago

哦你是准备用api啊，我们的api是基于fastapi开发的，没法直接在zero环境用，zero环境只支持gradio运行时，zero和gradio依赖很深，不管是唤醒zero还是调度zero任务都依赖gradio运行时，所以没法脱离gradio sdk部署和使用zero环境

建议用非zero配置启动spaces

或者，尝试简单魔改一下，把我们的api替换成，gradio自带的那个自动生成的REST API（那个api应该可以在zero环境使用，因为调度逻辑都是gradio内部的）

lenML / Speech-AI-Forge