[Bug]: Txt2vid stuck, loading pipeline forever

killporter commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

[X] I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Hello! i tried installing the extention, i copied the models (text2video_pytorch_model.pth, open_clip_pytorch_model.bin, VQGAN_autoencoder.pth) and the config json in a folder i created (ui (main folder of my ui installation, this on google colab) Models>ModelScope>t2v

But wheni try to run it, even with default setting i get stuck on

Starting text2video Pipeline setup config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})

For 10-15 minutes with nothin' happening

Steps to reproduce the problem

Just installing and trying to run anything in the extention

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - txt2vid commit id -

What GPU were you using for launching?

Standard Default GPU of the free google colab tier

On which platform are you launching the webui backend with the extension?

No response

Settings

Standard Setting

Console logs

Loading Unprompted v7.9.1 by Therefore Games
(SETUP) Initializing Unprompted object...
(SETUP) Loading configuration files...
(SETUP) Debug mode is False
Loading weights [92970aa785] from /content/gdrive/MyDrive/sd-backup/stable-diffusion-webui/models/Stable-diffusion/dreamlike-photoreal-2.0.safetensors
Creating model from config: /content/gdrive/MyDrive/sd/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0): 
Model loaded in 22.0s (load weights from disk: 16.5s, create model: 1.0s, apply weights to model: 2.9s, apply half(): 0.9s, move model to device: 0.7s).
Panorama Viewer: enable file-drag-and-drop into txt2img gallery...
Panorama_Viewer: adding sendto button in parent_elem_id: image_buttons_txt2img
Panorama_Viewer: adding sendto button in parent_elem_id: image_buttons_img2img
Panorama_Viewer: adding sendto button in parent_elem_id: image_buttons_extras
Running on public URL: https://5dade70d-2bd9-45bc.gradio.live/
✔ Connected
Startup time: 46.8s (import gradio: 3.8s, import ldm: 8.5s, other imports: 4.8s, list extensions: 1.6s, load scripts: 2.4s, load SD checkpoint: 22.0s, create ui: 1.1s, gradio launch: 2.4s, scripts app_started_callback: 0.1s).
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})

Additional information

No response

kabachuha commented 1 year ago

I suspect it may happen because of RAM getting OOM

kabachuha commented 1 year ago

Try launching it again, it used to load the pipeline twice in one of the last (fixed) breaking updates and so it may have been freezing your RAM

erballe commented 1 year ago

I have the same problem on g4dn.xlarge AWS EC2 (GPU NVIDIA T4 with CPU Intel Cascade Lake). RAM does not seems a problem (10GB used on 15GB available). It stays loading forever.

G-force78 commented 1 year ago

Same here with a collab, eventually it runs out of memory giving a ^C error

github-actions[bot] commented 1 year ago

This issue has been closed due to incorrect formatting. Please address the following mistakes and reopen the issue:

Include THE FULL LOG FROM THE START OF THE WEBUI in the issue description.
Provide a valid commit ID in the format 'commit id - [commit_hash]' both for the WebUI and the Extension.

dvschultz commented 1 year ago

~~Are there any updates to this issue? I’m trying to run it now on an A100 and High RAM settings on Colab and still see this.~~

Nevermind, new Colab GUI issue. Was on a T4 which was 100% the issue.

kabachuha / sd-webui-text2video