kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 107 forks source link

[Bug]: When I upload the video in Vid2VId tab it works as txt2vid not vid2vid. #63

Closed toyxyz closed 1 year ago

toyxyz commented 1 year ago

Is there an existing issue for this?

Are you using the latest version of the extension?

What happened?

When I upload the video in Vid2VId tab it works as txt2vid not vid2vid. I also tried 'Input video path' but it doesn't work. image

Steps to reproduce the problem

Go to the Vid2vid tab, upload a single video file and press the Generate button.

What should have happened?

Vid2vid, not txt2vid, should work.

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c364061ae6efb37f797b6b522cb3cf7aa2 txt2vid commit id - 44a82864a0134bfc8d78456e8a63532f38a57cde

What GPU were you using for launching?

3090ti

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

image

Console logs

Git commit: 44a82864 (Fri Mar 24 19:57:10 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0031, device='cuda:0') tensor(1.0040, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:22<00:00,  1.35it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230325_072326329309.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0048, device='cuda:0') tensor(0.9962, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:15<00:00,  2.06it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:15<00:00,  2.03it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230325_072346565309.mp4
text2video finished, saving frames to F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237
Got a request to stitch frames to video using FFmpeg.
Frames:
F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237\%06d.png
To Video:
F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.46 seconds!
t2v complete, result saved at F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237


### Additional information

_No response_
kabachuha commented 1 year ago

See https://github.com/deforum-art/sd-webui-modelscope-text2video/issues/62

and try lowering your denoising strength from 1 to around 0.5