hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.1k stars 1.9k forks source link

Huggingface Space gives HTTP 500 / RuntimeError: Error while initializing ZeroGPU: Unknown #532

Open dreher-in opened 1 week ago

dreher-in commented 1 week ago

I cloned you space and get the fowwling errors:

Runtime error

 the fused layernorm kernel
  warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading shards:  50%|█████     | 1/2 [00:15<00:15, 15.22s/it]

Downloading shards: 100%|██████████| 2/2 [00:32<00:00, 16.44s/it]
Downloading shards: 100%|██████████| 2/2 [00:32<00:00, 16.26s/it]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:  50%|█████     | 1/2 [00:11<00:11, 11.52s/it]

Loading checkpoint shards: 100%|██████████| 2/2 [00:26<00:00, 13.66s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:26<00:00, 13.34s/it]
Traceback (most recent call last):
  File "/home/user/app/app.py", line 396, in <module>
    def run_image_inference(
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/decorator.py", line 113, in _GPU
    client.startup_report()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/client.py", line 45, in startup_report
    raise RuntimeError("Error while initializing ZeroGPU: Unknown")
RuntimeError: Error while initializing ZeroGPU: Unknown

Container logs:


===== Application Startup at 2024-06-22 18:17:57 =====

/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/site-packages/colossalai/pipeline/schedule/_utils.py:19: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten)
/usr/local/lib/python3.10/site-packages/torch/utils/_pytree.py:300: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration.
  warnings.warn(
/usr/local/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel
  warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading shards:  50%|█████     | 1/2 [00:15<00:15, 15.22s/it]

Downloading shards: 100%|██████████| 2/2 [00:32<00:00, 16.44s/it]
Downloading shards: 100%|██████████| 2/2 [00:32<00:00, 16.26s/it]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:  50%|█████     | 1/2 [00:11<00:11, 11.52s/it]

Loading checkpoint shards: 100%|██████████| 2/2 [00:26<00:00, 13.66s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:26<00:00, 13.34s/it]
Traceback (most recent call last):
  File "/home/user/app/app.py", line 396, in <module>
    def run_image_inference(
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/decorator.py", line 113, in _GPU
    client.startup_report()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/client.py", line 45, in startup_report
    raise RuntimeError("Error while initializing ZeroGPU: Unknown")
RuntimeError: Error while initializing ZeroGPU: Unknown
/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Fabrice-TIERCELIN commented 1 week ago

Hi,

I'm facing the same error with SUPIR:

https://huggingface.co/spaces/Fabrice-TIERCELIN/SUPIR

So I think it's not related to Open-Sora, but to HuggingFace. I'm just restarting the same code and it's the first time I see this error. I think that something is down in HuggingFace.

Fabrice-TIERCELIN commented 1 week ago

Message from Hysts, HuggingFace Staff:

Thanks for reporting. I think it's due to an infra issue and it's already reported internally.

github-actions[bot] commented 1 day ago

This issue is stale because it has been open for 7 days with no activity.