nateraw / stable-diffusion-videos

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
Apache License 2.0
4.34k stars 413 forks source link

Using it with latest stable diffusion 2 #162

Closed javismiles closed 1 year ago

javismiles commented 1 year ago

Hi everybody, is there a way to make this great repo work with the latest stable diffusion 2 version? I tried to replace the model id with: stabilityai/stable-diffusion-2 but then I get errors when using some of the library methods (which doesn't happen when using 1.4 or 1.5 models) thank you for any tips

Atomic-Germ commented 1 year ago

@javismiles, Would you please give the errors you're recieving, your environment, and which of the methods you're trying to use?

I am trying things manually with stabilityai/stable-diffusion-2-1 and seeing no issues at all yet.

javismiles commented 1 year ago

@Atomic-Germ thank you very much for your message, this is the gist of what I'm using:

pipeline = StableDiffusionWalkPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, revision="fp16", ).to("cuda")

and the error that appears immediately when running that is this:

ValueError: Pipeline <class 'stable_diffusion_videos.stable_diffusion_pipeline.StableDiffusionWalkPipeline'> expected {'feature_extractor', 'vae', 'scheduler', 'safety_checker', 'text_encoder', 'unet', 'tokenizer'}, but only {'text_encoder', 'unet', 'tokenizer', 'vae', 'scheduler'} were passed

what do you think? Im not running it in Gradio, Im running it manually in a jupyter notebook to have full control with things, but again the issue is super simple, I run that instruction above and get the error below

nateraw commented 1 year ago

Set feature_extractor=None and safety_checker=None in from pretrained fn.

That should solve your problem.

javismiles commented 1 year ago

@nateraw yes it works, thank you very much!

javismiles commented 1 year ago

@nateraw sorry I have a new issue, so thanks to your change now generating single images works perfect with 2.1, using:

image = pipeline(prompt, height=448, width=800, guidance_scale=7.5, num_inference_steps=50,generator=generator).images[0]

however when I now try to do this (which is what I need):

video_path = pipeline.walk( [ "whatever", "whatever" "whatever ], [1111, 57,55], upsample=True, fps=5,
num_interpolation_steps=90,
height=448,
width=800,
) visualize_video_colab(video_path)

it generates everything black, in the "dream" folder, all images come out black

but if I do it single images with just pipeline single generations are perfect,

but if I try to use pipeline.walk to do multiple for interpolation

then in dreams folder all come out as black and video is also black

how can I fix it? thank you :)

javismiles commented 1 year ago

@nateraw

so this works perfect and generates a great image with SD 2.1: prompt = "whatever" generator = torch.Generator("cuda").manual_seed(1111) image = pipeline(prompt, height=512, width=512, guidance_scale=7.5, num_inference_steps=50, generator=generator).images[0]

but the following generates an all black result with the very same prompt and same SD 2.1 pipeline, and there are no errors, it generates images but fully black:

video_path = pipeline.walk( [ "whatever", "whatever", "whatever",
], [1111, 343, 57], upsample=True, fps=5,
num_interpolation_steps=5,
height=512,
width=512,
)

and what I ultimately need is to do the walk for interpolation, let me know how can I fix it thank you :)

javismiles commented 1 year ago

@nateraw and this is how I initialized the pipeline:

from stable_diffusion_videos import StableDiffusionWalkPipeline, Interface

pipeline = StableDiffusionWalkPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", feature_extractor=None, safety_checker=None, torch_dtype=torch.float16, revision="fp16", ).to("cuda")

interface = Interface(pipeline)

in case it can help this is how the class gets defined in my code:

<bound method StableDiffusionWalkPipeline.walk of StableDiffusionWalkPipeline { "_class_name": "StableDiffusionWalkPipeline", "_diffusers_version": "0.11.1", "feature_extractor": [ null, null ], "safety_checker": [ null, null ], "scheduler": [ "diffusers", "DDIMScheduler" ], "text_encoder": [ "transformers", "CLIPTextModel" ], "tokenizer": [ "transformers", "CLIPTokenizer" ], "unet": [ "diffusers", "UNet2DConditionModel" ], "vae": [ "diffusers", "AutoencoderKL" ] }

nateraw commented 1 year ago

Yea interesting. I actually ran into this last night as well. Don't believe it was happening before...not 100% sure of that though.

My guess is it has to do with the scheduler and perhaps how we're handling scheduling here. Will investigate.

In the meantime, maybe give stable diffusion 2.1 base a try.


btw as an aside, I'm trying to fix the init here so we don't have to do the feature_extractor=None business in from pretrained. Made separate issue for it at #165

javismiles commented 1 year ago

@nateraw thank you for the reply, interesting, I cross fingers that hopefully you find the way to fix it :) when you say "maybe give stable diffusion 2.1 base a try", what do you mean? I already used SD 2.1 without problems, to produce static images, all good, the problem is when trying to produce with it the interpolated vids :)

nateraw commented 1 year ago

I just fixed the from_pretrained issue on main branch with #165 . If you install the main branch of the repo, perhaps the following script will work for you:

pip install git+https://github.com/nateraw/stable-diffusion-videos
import torch

from stable_diffusion_videos import StableDiffusionWalkPipeline
from diffusers import DPMSolverMultistepScheduler

device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if device == "cuda" else torch.float32
pipe = StableDiffusionWalkPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch_dtype,
).to(device)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

pipe.walk(
    prompts=['a cat', 'a dog'],
    seeds=[1234, 4321],
    num_interpolation_steps=5,
    num_inference_steps=50,
    fps=5,
)

pls let me know :)

I'll copy this into colab and try it when I get a few mins

nateraw commented 1 year ago

Ok I just threw it in colab and it seemed to work.

Open In Colab

Or gist link directly if you prefer to read it that way.

javismiles commented 1 year ago

@nateraw yes indeed, it works :) now I tried it properly and it works indeed, thank you very much :)

javismiles commented 1 year ago

@nateraw it works just now it throws out a lot of these warning messages like:

Forward upsample size to force interpolation output size. Forward upsample size to force interpolation output size.

is it possible to hide them in some way? thank you :)

nateraw commented 1 year ago

Ah this may be because of #156 , I changed the logging verbosity to INFO by default in that file.

Should be able to suppress by changing the diffusers logging verbosity.

Throw this at the top of your script:

from diffusers.utils import logging

logging.set_verbosity_warning()

Refer to docs here for more info.

nateraw commented 1 year ago

If this issue is resolved please close it 😄

Feel free to open more issues if you run into anything else or have more questions!! ❤️

javismiles commented 1 year ago

@nateraw thank you for your fantastic help :)

virtualmartire commented 1 year ago

I just fixed the from_pretrained issue on main branch with #165 . If you install the main branch of the repo, perhaps the following script will work for you:

pip install git+https://github.com/nateraw/stable-diffusion-videos
import torch

from stable_diffusion_videos import StableDiffusionWalkPipeline
from diffusers import DPMSolverMultistepScheduler

device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if device == "cuda" else torch.float32
pipe = StableDiffusionWalkPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch_dtype,
).to(device)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

pipe.walk(
    prompts=['a cat', 'a dog'],
    seeds=[1234, 4321],
    num_interpolation_steps=5,
    num_inference_steps=50,
    fps=5,
)

pls let me know :)

I'll copy this into colab and try it when I get a few mins

If I use this instructions, the pipeline walks properly but produce only "random" images (like the ones in a kaleidoscope). Does somebody know why?

frame000000