intellerce / controlanimate

ControlAnimate Library
https://www.intellerce.com
Apache License 2.0
47 stars 3 forks source link

VAE is probably not being applied correctly #4

Closed SyedSherjeel closed 9 months ago

SyedSherjeel commented 9 months ago

Hey results are quite dark when being passed through the pipeline. Here is my configuration and sample output Screenshot 2023-11-24 at 12 54 44 PM

SyedSherjeel commented 9 months ago

Here are my configuration I suspect van is not being applied well

Config File:

######################################################

INPUTS

###################################################### input_video_path: "beautiful_girl_test.mp4" # Path to the input video file output_video_dir: "/home/evobits/sherjeel/vid2vid/controlanimate/controlanimate/output" # Directory to save the outputs

save_frames: 0 # 0: No, 1: Yes

Width and Height of the Input and Output videos

If zero the input's video dimension will be used otherwise the input will be resized

width: 512 height: 768

(blurred background)+

prompt: "(Highest quality)++ colorful++ detailed++ beautiful++ (wonder woman)++ woman+ with (perfect pretty face)++ (female superhero)+, (perfect eyes)++, (fully dressed)++, (sharp features)+, (commanding posture)+, (perfect hands)++, (perfect fingers)++, (phenomenal aesthetic)+, masterpiece, best quality, sharp focus, 8k, uhd, dslr, cinematic photo, realistic photo, high quality" n_prompt: "easynegative+, pale++, (dark hands)+++, nudity, mask++, (bad face)+++, (bad mouth)+++, (worst quality)+, (missing hand)+++, (missing fingers)+++,,(low quality)+, lowres, bad anatomy, (text, font, logo, copyright, watermark)++, wrong face, wrong hands, wrong legs, wrong feet, (nsfw,nude)+, wrong fingers"

Additional Basic Parameters

start_time: "00:00:00" # Time in HH:MM:SS format to start reading the input video end_time: "00:00:10" # Time in HH:MM:SS format to stop reading the input video

Use the last frame of each AnimateDiff output sequence as reference for the next epoch using the following img2img strength

overlap_strength: 0.85

Native LCM Model?

use_lcm: 0 # 0: No, 1: Yes - Use original LCM model. If used, the settings related to LoRA and DreamBooth will be ignored as LCM models do not support them.

It worth noting that LCM does not have support for negative prompt at the moment.

Also, it should be noted that LCM-LoRA is different from native LCM and its usage does not disable any other features.

######################################################

MODELS

######################################################

Base model that AnimateDiff uses to create the initial architecture from (it needs to be in HuggingFace format (.bin))

pretrained_model_path: "models/StableDiffusion/stable-diffusion-v1-5"

Optional Alternative AutoEncoder (VAE)

vae_path: "models/VAE/vae-ft-mse-840000-ema-pruned.ckpt" #"models/VAE/vae-ft-ema-560000-ema-pruned.safetensors"

Optional DreamBooth Model (full)

dreambooth_path: "models/DreamBooth_LoRA/dreamshaper_8.safetensors" #"models/DreamBooth_LoRA/aZovyaRPGArtistTools_v3.safetensors"

Optional LoRA model to be used

lora_model_path: "models/DreamBooth_LoRA/lcm_lora.safetensors" lora_weight: 1.0

Motion Module to be used - versions of the config and the model must match

inference_config_path: "configs/inference/inference-v2.yaml" motion_module: "models/Motion_Module/mm_sd_v15_v2.ckpt"

ControlNets

Optional ControlNet Models to be used - will be downloaded automatically

controlnets:

cond_scale:

guess_mode: 1 # 0: No, 1: Yes - To use guess mode in controlnet or not.

######################################################

PARAMETERS

######################################################

IP-Adapter:

use_ipadapter: 0 # 0: No, 1: Yes ipa_scale: 0.65 # Strength of IP-Adapter do_initial_generation: 1 # 0: No, 1: Yes -> Generate a few initial frames to be used as baseline for the next image generations.

Upscaling and Face Restoration:

upscale: 2 # Upscaler value for the input image use_face_enhancer: 0 # 0: No, 1: Yes upscale_first: 1 # Upscale before applying face enhancement - better results but slower: 0: No, 1: Yes

frame_count: 16 # How many co-related frames are produced by AnimateDiff - defaults to 16 overlap_length: 8 # Number of frames from previous output frames to be present in the current frames (helps with consistency)

seed: 23711 # Random Seed steps: 12 # Denoising steps guidance_scale: 1.35
strength: 1.0 # Strengtht of the noise to be added to input latents - if 1.0 the img2img effect is nil

Choice of scheduler: "DDIMScheduler", "EulerDiscreteScheduler" ,"DPMSolverMultistepScheduler","EulerAncestralDiscreteScheduler","LMSDiscreteScheduler","PNDMScheduler", "LCMScheduler"

scheduler: "LCMScheduler"

fps: 2 # The framerate to sample the input video fps_ffmpeg: 2 # The framerate of the output video (FFMPEG interpolation will be used if greater than fps) crf: 23 # A measure of quality - lower is better

######################################################

ADDITIONAL

######################################################

ffmpeg_path: "/usr/bin/ffmpeg"

intellerce commented 9 months ago

Hi, Thank you for pointing out a potential issue. Since I don't have your input file I cannot recreate it, but I have just updated the code with some changes and improvements that might help w/ getting better results.

SyedSherjeel commented 9 months ago

hey thanks for response. what u think might be potential issue? Ill be super grateful

Hi, Thank you for pointing out a potential issue. Since I don't have your input file I cannot recreate it, but I have just updated the code with some changes and improvements that might help w/ getting better results.

  • Please try again with the new code and let me know if the issue persists - Especially try setting "do_initial_generation" to 0, disabling LCM-LoRA, using other schedulers like EulerDiscrete, and using other seed values (-1 for random seed) and see what you get...
  • About the VAE, you can use the base model's default VAE by setting the vae_path to "".
  • Btw, the sample results are made with the same VAE, so I am not entirely sure that what you see is due to the VAE.
intellerce commented 9 months ago

I think it is the LCM LoRA that causes darker images. Try commenting it out and make sure to change the scheduler to Euler or DDIM and increase the steps. Also make sure to pull the latest codebase. Let me know how it goes.

SyedSherjeel commented 9 months ago

I think it is the LCM LoRA that causes darker images. Try commenting it out and make sure to change the scheduler to Euler or DDIM and increase the steps. Also make sure to pull the latest codebase. Let me know how it goes.

Still same issue Screenshot 2023-11-27 at 10 53 41 AM

here is my video url to reproduce issue https://drive.google.com/file/d/1zAdpqZCDz5yen8U_gR_x_PizJp1jpGaP/view?usp=share_link

intellerce commented 9 months ago

That's odd. Did you increase the CFG too? Try this config, and if you are still getting dull results we are missing something...!

# Config File:

######################################################
# INPUTS
######################################################
input_video_path: "tmp/beautiful_girl_test.mp4" # Path to the input video file
output_video_dir: "tmp/output"  # Directory to save the outputs

save_frames: 1 # 0: No, 1: Yes

# Width and Height of the Input and Output videos
# If zero the input's video dimension will be used otherwise the input will be resized
width: 512
height: 1024

prompt: "(Highest quality)++ colorful++ detailed++ beautiful++ (wonder woman)++ woman+ with (perfect pretty face)++ (female superhero)+, (perfect eyes)++, (fully dressed)++, (sharp features)+, (commanding posture)+, (perfect hands)++, (perfect fingers)++, (phenomenal aesthetic)+,  best quality, sharp focus, 8k, uhd, dslr, realistic photo, high quality" # (phenomenal aesthetic)+, masterpiece,
n_prompt: "easynegative+, pale++, (dark hands)++, muscular++, nudity, mask++, (bad face)++, (bad mouth)++, (worst quality)+, (missing hand)++, (missing fingers)++,,(low quality)+, lowres, bad anatomy, (text, font, logo, copyright, watermark)++, wrong face, wrong hands, wrong legs, wrong feet, (nsfw,nude)+, wrong fingers"

# Additional Basic Parameters
start_time: "00:00:00" # Time in HH:MM:SS format to start reading the input video
end_time: "00:00:10" # Time in HH:MM:SS format to stop reading the input video

# Use the last frame of each AnimateDiff output sequence as reference for the next epoch using the following img2img strength
overlap_strength: .95

# Native LCM Model?
use_lcm: 0 # 0: No, 1: Yes - Use original LCM model. If used, the settings related to LoRA and DreamBooth will be ignored as LCM models do not support them.
# It worth noting that LCM does not have support for negative prompt at the moment.
# Also, it should be noted that LCM-LoRA is different from native LCM and its usage does not disable any other features.

use_img2img: 0 # 0: No, 1: Yes - Use img2img for non-overlapping frames. If no, then last output frame will be used as base for the added noise of non-overlapping frames.

######################################################
# MODELS
######################################################

# Base model that AnimateDiff uses to create the initial architecture from (it needs to be in HuggingFace format (.bin))
pretrained_model_path: "models/StableDiffusion/stable-diffusion-v1-5"

# Optional Alternative AutoEncoder (VAE)
vae_path: "models/VAE/vae-ft-mse-840000-ema-pruned.ckpt" #"models/VAE/vae-ft-ema-560000-ema-pruned.safetensors"

# Optional DreamBooth Model (full)
dreambooth_path: "models/DreamBooth_LoRA/dreamshaper_8.safetensors"  #"models/DreamBooth_LoRA/aZovyaRPGArtistTools_v3.safetensors" 

# Optional LoRA model to be used
lora_model_paths: 
  # - "models/DreamBooth_LoRA/lcm_lora.safetensors" 

lora_weights: 
  # - 0.5

# Motion Module to be used - versions of the config and the model must match
inference_config_path: "configs/inference/inference-v2.yaml"
motion_module: "models/Motion_Module/mm_sd_v15_v2.ckpt"

# ControlNets
# Optional ControlNet Models to be used - will be downloaded automatically
controlnets:
  - lllyasviel/control_v11p_sd15_openpose
  - lllyasviel/control_v11p_sd15_lineart
  - lllyasviel/control_v11p_sd15_mlsd
  - lllyasviel/control_v11p_sd15_canny
  - lllyasviel/control_v11p_sd15_softedge

cond_scale:
  - 1.0
  - 0.5
  - 1.0
  - 0.45
  - 0.25

guess_mode: 1 # 0: No, 1: Yes - To use guess mode in controlnet or not.

loop_back_frames: 1 # 0: No, 1: Yes - To use generated overlapping frames as inputs for the ControlNets or not.

######################################################
# PARAMETERS
######################################################

# IP-Adapter:
use_ipadapter: 0 # 0: No, 1: Yes
ipa_scale: 0.65 # Strength of IP-Adapter
do_initial_generation: 0 # 0: No, 1: Yes -> Generate a few initial frames to be used as baseline for the next image generations.

# Upscaling and Face Restoration:
upscale: 2 # Upscaler value for the input image
use_face_enhancer: 1 # 0: No, 1: Yes
upscale_first: 1 # Upscale before applying face enhancement - better results but slower: 0: No, 1: Yes

frame_count: 16 # How many co-related frames are produced by AnimateDiff - defaults to 16
overlap_length: 8 # Number of frames from previous output frames to be present in the current frames (helps with consistency)

seed: 41983 # Random Seed 29896 44737
steps: 30 # Denoising steps 
guidance_scale: 7.5
strength: 1.0 # Strengtht of the noise to be added to input latents - if 1.0 the img2img effect is nil

# Choice of scheduler: "DDIMScheduler", "EulerDiscreteScheduler" ,"DPMSolverMultistepScheduler","EulerAncestralDiscreteScheduler","LMSDiscreteScheduler","PNDMScheduler", "LCMScheduler"
scheduler: "EulerDiscreteScheduler" #"LCMScheduler"  

fps: 15 # The framerate to sample the input video
fps_ffmpeg: 30 # The framerate of the output video (FFMPEG interpolation will be used if greater than fps)
crf: 23 # A measure of quality - lower is better 

######################################################
# ADDITIONAL
######################################################

ffmpeg_path: "/usr/bin/ffmpeg"
SyedSherjeel commented 9 months ago
Screenshot 2023-11-27 at 7 03 25 PM

this got weird

intellerce commented 9 months ago

Okay, at least its colors are not dull anymore.... You need to play with the configs especially the controlnet settings, prompts, overlapping strength, and the seed value to get your desired output. I will create a small guide in the readme in the coming days. I think we can at least confirm that it is not a VAE issue.

SyedSherjeel commented 9 months ago

hey I think that's it thank u ill look forward to it. Much grateful for assistance