kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 107 forks source link

[Bug]: Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w". Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}. Expected 5 dimensions, got 4 #25

Closed Pythonpa closed 1 year ago

Pythonpa commented 1 year ago

Is there an existing issue for this?

Are you using the latest version of the extension?

What happened?

Run on a 3060Ti card,8Gig Vram.

"CUDA is out of Memory " error is displayed when selecting the GPU to run. And if I selected the CPU to run, another error message came out. Here is the error information below.

Steps to reproduce the problem

① I have a 3060Ti--8G Vram graphics card, but after I install the T2V plug-in as required, I keep the default T2V Settings, such as 24 frames, 256 pixels, the system will still alert CUDA is OUT of memory error, I don't know what is wrong with the Settings?

② Then I select the CPU running mode, and the system can process DDIM, but the following error will be displayed: Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) Starting text2video False

DECODING FRAMES torch.Size([24, 4, 32, 32]) STARTING VAE ON CPU Exception occured

Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w". Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}. Expected 5 dimensions, got 4

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - [a9fed7c3] txt2vid commit id -version 1.0b

What GPU were you using for launching?

3060Ti 8G VRAM

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

WIN 10 + Edge Browser

Console logs

You are running torch 1.12.1+cu113.
The program is tested to work with torch 1.13.1.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
=================================================================================
You are running xformers 0.0.14.dev.
The program is tested to work with xformers 0.0.16rc425.
To reinstall the desired version, run with commandline flag --reinstall-xformers.

Use --skip-version-check commandline argument to disable this check.
=================================================================================
Loading weights [4e704d22c3] from D:\AI_WebUI_SD\models\Stable-diffusion\SunshineMix&SunlightMix.safetensors
Creating model from config: D:\AI_WebUI_SD\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(7): easynegative, emb-rrf2, gelatapuella, ghst-3000, opt-6000, PureErosFace_V1, ulzzang-6500-v1.1
Textual inversion embeddings skipped(1): DaveSpaceFour
Model loaded in 1.4s (create model: 0.3s, apply weights to model: 0.6s, apply half(): 0.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 19.4s (import gradio: 2.6s, import ldm: 0.7s, other imports: 1.3s, list extensions: 1.4s, load scripts: 1.6s, load SD checkpoint: 1.5s, create ui: 10.1s, gradio launch: 0.2s).
ModelScope text2video extension for auto1111 webui
Git commit: ab1c4e74 (Mon Mar 20 22:22:46 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Starting text2video
False

DECODING FRAMES
torch.Size([24, 4, 32, 32])
STARTING VAE ON CPU
Exception occured
提示:Python 运行时抛出了一个异常。请检查疑难解答页面。
 Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w".
 Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}.
 Expected 5 dimensions, got 4

Additional information

No response

jiangds2018 commented 1 year ago

the same bug occurred

15704080 commented 1 year ago

Yes, I also use RTX3070 8GB,and make the same bug occurred,I use the GPU or CPU(low vram),This error occurred while running, and Before I used the CPU. The first options required 16G to use,oh I feel like I can use it now. I don't know what happened, I just replaced the CPU with a GPU, but I previously used a GPU that required 16G

kabachuha commented 1 year ago

@15704080 I added support for half precision yesterday (which wasn't present in the original code for some reason), and now it requires less VRAM

mariaWitch commented 1 year ago

I am currently getting this issue on the most recent commit of the extension.

Crimsonfart commented 1 year ago

Same Error here when CPU Checkt. I Have a GTX 3060TI 8 GB VRAM

Git commit: 9c5e5f90 (Tue Mar 21 13:17:17 2023) Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'}) Starting text2video False DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 20/20 [00:53<00:00, 2.68s/it] DECODING FRAMES torch.Size([45, 4, 32, 32]) STARTING VAE ON CPU Exception occured Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w". Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}. Expected 5 dimensions, got 4