hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
22.06k stars 2.15k forks source link

RuntimeError: input must be a CUDA tensor #566

Closed QiuLL closed 4 weeks ago

QiuLL commented 3 months ago

python inference71.py configs/opensora-v1-2/inference/sample.py --num-frames 4s --resolution 720p --aspect-ratio 9:16 --prompt "a beautiful waterfall" /data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/hub.py:142: FutureWarning: Using the environment variable HUGGINGFACE_CO_RESOLVE_ENDPOINT is deprecated and will be removed in Transformers v5. Use HF_ENDPOINT instead. warnings.warn( /data/miniconda/envs/opensora/lib/python3.10/site-packages/colossalai/pipeline/schedule/_utils.py:19: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/utils/_pytree.py:300: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration. warnings.warn( /data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node(

Traceback (most recent call last): File "/data/qll/Open-Sora/inference71.py", line 303, in main() File "/data/qll/Open-Sora/inference71.py", line 265, in main samples = scheduler.sample( File "/data/qll/Open-Sora/opensora/schedulers/rf/init.py", line 52, in sample model_args = text_encoder.encode(prompts) File "/data/qll/Open-Sora/opensora/models/text_encoder/t5.py", line 192, in encode caption_embs, emb_masks = self.t5.get_text_embeddings(text) File "/data/qll/Open-Sora/opensora/models/text_encoder/t5.py", line 129, in get_text_embeddings text_encoder_embs = self.model( File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1975, in forward encoder_outputs = self.encoder( File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1110, in forward layer_outputs = layer_module( File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 694, in forward self_attention_outputs = self.layer[0]( File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward normed_hidden_states = self.layer_norm(hidden_states) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 416, in forward return fused_rms_norm_affine( File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 215, in fused_rms_norm_affine return FusedRMSNormAffineFunction.apply(args) File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/data/miniconda/envs/opensora/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 75, in forward output, invvar = fused_layer_norm_cuda.rms_forward_affine( RuntimeError: input must be a CUDA tensor

how to solve this?

JThh commented 3 months ago

Can you set 'CUDA_VISIBLE_DEVICES=0' before the inference command, and if it does not help, print the device of input texts and T5 encoder model to see if they are on GPU already?

FrankLeeeee commented 3 months ago

Hi, did you build apex from source?

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 7 days with no activity.

jacobswan1 commented 2 months ago

I encountered the same situation. I built the apex from source as instructed in installation.MD. Any clues for this? DId you resolve this? @QiuLL

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 4 weeks ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

TalRemez commented 2 weeks ago

Same problem. Any idea how to solve this?

5-Jeremy commented 2 weeks ago

I still get this error even after disabling the fused layernorm kernel (though with apex still installed).