hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.54k stars 2.06k forks source link

RuntimeError: CUDA error: no kernel image is available for execution on the device #596

Open baoblei opened 1 month ago

baoblei commented 1 month ago
Traceback (most recent call last):
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/gradio/blocks.py", line 1514, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/gradio/utils.py", line 833, in wrapper
    response = f(*args, **kwargs)
  File "/root/mySora/Open-Sora/gradio/app.py", line 414, in run_image_inference
    return run_inference(
  File "/root/mySora/Open-Sora/gradio/app.py", line 356, in run_inference
    samples = scheduler.sample(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/rf/__init__.py", line 52, in sample
    model_args = text_encoder.encode(prompts)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/text_encoder/t5.py", line 190, in encode
    caption_embs, emb_masks = self.t5.get_text_embeddings(text)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/text_encoder/t5.py", line 127, in get_text_embeddings
    text_encoder_embs = self.model(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1980, in forward
    encoder_outputs = self.encoder(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1115, in forward
    layer_outputs = layer_module(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 695, in forward
    self_attention_outputs = self.layer[0](
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 602, in forward
    attention_output = self.SelfAttention(
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 544, in forward
    position_bias = self.compute_bias(real_seq_length, key_length, device=scores.device)
  File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 441, in compute_bias
    context_position = torch.arange(query_length, dtype=torch.long, device=device)[:, None]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

cuda12.4

FrankLeeeee commented 1 month ago

Hi @baoblei , can you check your version compatibility with the following?

# check pytorch's cuda version
python -c "import torch;print(torch.version.cuda)"

# check your cuda driver version on the top right corner
nvidia-smi

# check your cuda runtime version
nvcc -V
baoblei commented 1 month ago

Thanks, I checked my version: pytorch‘s cuda version: 11.8 driver version: 12.4 runtime version: 11.8 Sorry I didn't make it clear before. This error occurred in v1.2, and I tried again in v1.1, but the same error occurred.

JThh commented 1 month ago

I'd suggest you install again: conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.