Closed AbhinavGopal closed 7 months ago
Cc: @DN6
What happens when you delete the pipe
object and then decode the latents?
When I delete the pipe object, still out of memory. Here's what I'm running: `
import imageio
from io import BytesIO
import PIL.Image
import requests
import torch
from diffusers import AnimateDiffVideoToVideoPipeline, AutoencoderKL, MotionAdapter
def load_video(file_path: str):
images = []
if file_path.startswith(("http://", "https://")):
# If the file_path is a URL
response = requests.get(file_path, timeout=10)
response.raise_for_status()
content = BytesIO(response.content)
vid = imageio.get_reader(content)
else:
# Assuming it's a local file path
vid = imageio.get_reader(file_path)
for frame in vid:
pil_image = PIL.Image.fromarray(frame).convert("RGB")
images.append(pil_image)
return images
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16).to('cuda')
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-ema", torch_dtype=torch.float16).to('cuda')
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16).to('cuda')
video = load_video("https://i.makeagif.com/media/8-25-2022/qpOyQo.gif")[:16]
prompt = "red countdown timer"
guidance_scale = 7.5
strength = 0.8
width, height = 512, 512
output_type = "latent"
combined_outputs = pipe(prompt=prompt,video=video,guidance_scale=guidance_scale,strength=strength, width=width,height=height, output_type=output_type, num_inference_steps=5).frames
del pipe
import gc
gc.collect()
torch.cuda.empty_cache()
#copied from diffusers animatediffvid2vidpipeline
latents = 1 / vae.config.scaling_factor * combined_outputs
batch_size, channels, num_frames, height, width = latents.shape
latents = latents.permute(0, 2, 1, 3, 4).reshape(batch_size * num_frames, channels, height, width)
image = vae.decode(latents).sample
video = image[None, :].reshape((batch_size, num_frames, -1) + image.shape[2:]).permute(0, 2, 1, 3, 4)
# we always cast to float32 as this does not cause significant overhead and is compatible with bfloat16
video = video.float()
Just taking a wild guess, but can you try something like:
with torch.no_grad():
images = pipe.decode_latents(combined_outputs)
Additionally, before running just maybe it might help to do garbage collection from torch cuda and python.
Works! Good guess HAHA
Describe the bug
When I use the AnimateDiffVideoToVideo pipeline to output images, my server doesn't run out of memory. However, if I output the latents, and then manually run pipe.decode_latents, I somehow run out of memory.
Reproduction
Code with manual latent decoding:
I get the error that is in the logs pasted below.
But if I directly run the pipe to get image outputs, I have no error.
I have about 22.5GB of vRAM available on my server.
Logs
System Info
diffusers
version: 0.28.0.dev0Who can help?
@DN6 @saya