kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 106 forks source link

[Bug]: the same seed creates a different video every time with vid2vid #144

Closed alexfredo closed 1 year ago

alexfredo commented 1 year ago

Is there an existing issue for this?

Are you using the latest version of the extension?

What happened?

Hi, is it normal that the same seed creates a different video every time with vid2vid even if I don't change the settings? I have tried on my computer and google colab, how fix that ? Thanks

Steps to reproduce the problem

put seed 1 and generate two time

What should have happened?

it's should create the same video

WebUI and Deforum extension Commit IDs

webui commit id - a9eab236 txt2vid commit id - I have a error message when I use "git rev-parse HEAD" "fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' HEAD" I have installed the latest version

What GPU were you using for launching?

nvidia geforce rtx 2060

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows), Google Colab (Other)

Settings

settingsbug

Console logs

venv "C:\AUTOMATIC1111\venv\Scripts\Python.exe"
Python 3.10.0 (tags/v3.10.0:b494f59, Oct  4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)]
Commit hash: <none>
Installing requirements for Web UI

Installing requirements for TemporalKit extension

Launching Web UI with arguments:
Stop Motion CN - Running Preload
Set Gradio Queue: True
No module 'xformers'. Proceeding without it.
[AddNet] Updating model hashes...
100%|████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 6683.62it/s]
[AddNet] Updating model hashes...
100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 10027.02it/s]
ControlNet v1.1.119
ControlNet v1.1.119
Loading weights [c0d1994c73] from C:\AUTOMATIC1111\models\Stable-diffusion\realisticVisionV20_v20.safetensors
Creating model from config: C:\AUTOMATIC1111\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Couldn't find VAE named JennaO.safetensors; using None instead
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(1): charturner
Model loaded in 12.9s (load weights from disk: 0.5s, create model: 0.5s, apply weights to model: 5.6s, apply half(): 0.8s, move model to device: 1.1s, load textual inversion embeddings: 4.3s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 35.9s (import torch: 9.8s, import gradio: 1.9s, import ldm: 1.0s, other imports: 1.9s, setup codeformer: 0.2s, load scripts: 4.0s, load SD checkpoint: 13.0s, create ui: 4.0s, gradio launch: 0.2s).
text2video — The model selected is:  ModelScope
 text2video extension for auto1111 webui
Git commit:
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
got a request to *vid2vid* an existing video.
Trying to extract frames from video with input FPS of 30.0. Please wait patiently.
Successfully extracted 181.0 frames from video.
Loading frames: 100%|██████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 42.64it/s]
Converted the frames to tensor (1, 25, 3, 256, 256)
Computing latents
STARTING VAE ON GPU
VAE HALVED
Working in vid2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]Making a video with the following parameters:
{'prompt': 'scarlett johansson', 'n_prompt': 'text, watermark, copyright, blurry, nsfw', 'steps': 30, 'frames': 24, 'seed': 1, 'scale': 17, 'width': 256, 'height': 256, 'eta': 0.0, 'cpu_vae': 'GPU (half precision)', 'device': device(type='cuda'), 'skip_steps': 7, 'strength': 0}
latents torch.Size([1, 4, 25, 32, 32]) tensor(0.0486, device='cuda:0', dtype=torch.float16) tensor(0.9185, device='cuda:0', dtype=torch.float16)
huh tensor(793) tensor([793], device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 24/24 [00:51<00:00,  2.14s/it]
STARTING VAE ON GPU. 13 CHUNKS TO PROCESS██████████████████████████████████████████████| 24/24 [00:51<00:00,  2.06s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([25, 3, 256, 256])
output/mp4s/20230504_191405469818.mp4
text2video finished, saving frames to C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191218
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191218\%06d.png
To Video:
C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191218\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.56 seconds!
t2v complete, result saved at C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191218
text2video — The model selected is:  ModelScope
 text2video extension for auto1111 webui
Git commit:
Starting text2video
Pipeline setup
device cuda
got a request to *vid2vid* an existing video.
Trying to extract frames from video with input FPS of 30.0. Please wait patiently.
Successfully extracted 181.0 frames from video.
Loading frames: 100%|██████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 50.27it/s]
Converted the frames to tensor (1, 25, 3, 256, 256)
Computing latents
STARTING VAE ON GPU
VAE HALVED
Working in vid2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]Making a video with the following parameters:
{'prompt': 'scarlett johansson', 'n_prompt': 'text, watermark, copyright, blurry, nsfw', 'steps': 30, 'frames': 24, 'seed': 1, 'scale': 17, 'width': 256, 'height': 256, 'eta': 0.0, 'cpu_vae': 'GPU (half precision)', 'device': device(type='cuda'), 'skip_steps': 7, 'strength': 0}
latents torch.Size([1, 4, 25, 32, 32]) tensor(0.0486, device='cuda:0', dtype=torch.float16) tensor(0.9185, device='cuda:0', dtype=torch.float16)
huh tensor(793) tensor([793], device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 24/24 [00:56<00:00,  2.36s/it]
STARTING VAE ON GPU. 13 CHUNKS TO PROCESS██████████████████████████████████████████████| 24/24 [00:56<00:00,  2.36s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([25, 3, 256, 256])
output/mp4s/20230504_191518691713.mp4
text2video finished, saving frames to C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191415
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191415\%06d.png
To Video:
C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191415\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.36 seconds!
t2v complete, result saved at C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191415
text2video — The model selected is:  ModelScope
 text2video extension for auto1111 webui
Git commit:
Starting text2video
Pipeline setup
device cuda
got a request to *vid2vid* an existing video.
Trying to extract frames from video with input FPS of 30.0. Please wait patiently.
Successfully extracted 181.0 frames from video.
Loading frames: 100%|██████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 44.21it/s]
Converted the frames to tensor (1, 25, 3, 256, 256)
Computing latents
STARTING VAE ON GPU
VAE HALVED
Working in vid2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]Making a video with the following parameters:
{'prompt': 'scarlett johansson', 'n_prompt': 'text, watermark, copyright, blurry, nsfw', 'steps': 30, 'frames': 24, 'seed': 1, 'scale': 17, 'width': 256, 'height': 256, 'eta': 0.0, 'cpu_vae': 'GPU (half precision)', 'device': device(type='cuda'), 'skip_steps': 7, 'strength': 0}
latents torch.Size([1, 4, 25, 32, 32]) tensor(0.0486, device='cuda:0', dtype=torch.float16) tensor(0.9185, device='cuda:0', dtype=torch.float16)
huh tensor(793) tensor([793], device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 24/24 [00:57<00:00,  2.41s/it]
STARTING VAE ON GPU. 13 CHUNKS TO PROCESS██████████████████████████████████████████████| 24/24 [00:57<00:00,  2.41s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([25, 3, 256, 256])
output/mp4s/20230504_191631766462.mp4
text2video finished, saving frames to C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191527
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191527\%06d.png
To Video:
C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191527\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.37 seconds!
t2v complete, result saved at C:\AUTOMATIC1111\outputs/img2img-images\text2video\20230504191527

Additional information

No response

github-actions[bot] commented 1 year ago

This issue has been closed due to incorrect formatting. Please address the following mistakes and reopen the issue: