MikeHanKK commented 7 months ago

Describe the bug

A clear and concise description of what the bug is.

If I use onediff to create a bunch of images with different sizes, it's very easy to get cuda out of memory. The original diffusers implementation doesn't have this problem.

Your environment

OS

ubuntu pytorch-lightning 1.4.2
torch 2.1.0
torch-fidelity 0.3.0
torchaudio 2.1.0
torchmetrics 1.0.2
torchvision 0.16.0 diffusers 0.26.2 Python 3.10.9

OneDiff git commit id

version: 0.9.1.dev20240413+cu118 git_commit: 67341c9 cmake_build_type: Release rdma: True mlir: True enterprise: False

OneFlow version info

Run python -m oneflow --doctor and paste it here.

How To Reproduce

Steps to reproduce the behavior(code or script):

import time

import torch from diffusers import AutoPipelineForText2Image, AutoencoderKL, StableDiffusionXLPipeline, DPMSolverMultistepScheduler from onediff.infer_compiler import oneflow_compile

model_path = '../models/Stable-diffusion/sd_xl_base_1.0' vae_path = '../models/VAE/sdxl-vae-fp16-fix'

vae = AutoencoderKL.from_pretrained(vae_path, torch_dtype=torch.float16)

pipeline = StableDiffusionXLPipeline.from_pretrained(model_path, torch_dtype=torch.float16, vae=vae, variant="fp16", add_watermarker=False).to('cuda')

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)

pipeline = compile_pipe(pipeline)

print("unet is compiled to oneflow.") pipeline.unet = oneflow_compile(pipeline.unet) print("vae is compiled to oneflow.") pipeline.vae.decoder = oneflow_compile(pipeline.vae.decoder)

prompt_lst = [ "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k", "picture of elf,an extremely delicate and beautiful face,in the forest", "(((cute anime rabbit))),bunny,chinese new year,year of the rabbit,sharp focus,perfect composition,volumetric lighting,dynamic", "A cyberpunk city,by Qibaishi,ink wash painting,perfect composition", "The streets of a beautiful Mediterranean town with many tourists,masterfully composed", "Water color, Switzerland, skiing, glass house, morning, cableway", "Blue sea, beach with few coconut trees, less people and water houses on the beach", "Cyberpunk Oriental Myth future world Miracle forest Sunshine Colorful Clouds Wizard of Oz Flying Birds Green Light", "Girl, stunning face, cartoon style, dynamic lighting, intricate details", "A group of adult women are walking, petals fall from the sky, 1080p, dense forest, sun through the shade of the tree, a child is running, big vision, photography" ]

sizes = [1024, 1152, 1360, 1536] for i, prompt in enumerate(prompt_lst): for h in sizes: for w in sizes: s = time.time() images = pipeline(prompt=prompt, guidance_scale=8.0, num_inference_steps=30, num_images_per_prompt=1, width=w, height=h).images print('text2img', time.time() - s) print(images) torch.cuda.empty_cache()

        total = torch.cuda.get_device_properties(0).total_memory / 1_000_000_000
        r = torch.cuda.memory_reserved(0) / 1_000_000_000
        print(f'available cuda memory ={total - r} GB')

The complete error message

terminate called after throwing an instance of 'oneflow::RuntimeException'
what(): Error: CUDA out of memory. Tried to allocate 3.1 GB
You can set ONEFLOW_DEBUG or ONEFLOW_PYTHON_STACK_GETTER to 1 to get the Python stack of the error.
Stack trace (most recent call last) in thread 26504:
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba720dce7, in
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba720d55c, in
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba7208dd8, in vm::ThreadCtx::TryReceiveAndRun()
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba71ab574, in vm::EpStreamPolicyBase::Run(vm::Instruction) const
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba71ae877, in vm::Instruction::Compute()
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba7235bcf, in vm::FuseInstructionPolicy::Compute(vm::Instruction)
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba71ae877, in vm::Instruction::Compute()
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba71b5c58, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba71b5929, in
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7faba71b0a7a, in
Object "/home/ubuntu/aidazuo/dazuovenv/lib/python3.10/site-packages/oneflow/../oneflow.libs/libonefl ow-f7b6ba13.so", at 0x7fab9e994d3c, in

Additional context

Add any other context about the problem here. img_v3_029v_2025e92c-43f5-4a2a-8c24-a36ae1f4e1fg

strint commented 6 months ago

VAE compile takes a lot of mem for the moment when generating images larger than 1024 * 1024. Please avoid this at current versioin.

# pipeline.vae.decoder = oneflow_compile(pipeline.vae.decoder)

when using with compile_pipe, doing this to avoid compile vae:

pipe = compile_pipe(pipe, ignores=("vae")

Cherwayway commented 6 months ago

I met this problem in the stable diffusion webui extension too. How should I avoid this?

strint commented 4 months ago