[Bug]: When I upload the video in Vid2VId tab it works as txt2vid not vid2vid.

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

[X] I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

When I upload the video in Vid2VId tab it works as txt2vid not vid2vid. I also tried 'Input video path' but it doesn't work.

Steps to reproduce the problem

Go to the Vid2vid tab, upload a single video file and press the Generate button.

What should have happened?

Vid2vid, not txt2vid, should work.

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c364061ae6efb37f797b6b522cb3cf7aa2 txt2vid commit id - 44a82864a0134bfc8d78456e8a63532f38a57cde

What GPU were you using for launching?

3090ti

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

Git commit: 44a82864 (Fri Mar 24 19:57:10 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0031, device='cuda:0') tensor(1.0040, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:22<00:00,  1.35it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230325_072326329309.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0048, device='cuda:0') tensor(0.9962, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:15<00:00,  2.06it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:15<00:00,  2.03it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230325_072346565309.mp4
text2video finished, saving frames to F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237
Got a request to stitch frames to video using FFmpeg.
Frames:
F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237\%06d.png
To Video:
F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.46 seconds!
t2v complete, result saved at F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237



### Additional information

_No response_

kabachuha / sd-webui-text2video

[Bug]: When I upload the video in Vid2VId tab it works as txt2vid not vid2vid. #63