Diffusers StableDiffusionInpaintPipeline OOM

Someant commented 4 months ago

Describe the bug

A clear and concise description of what the bug is.

diffusers使用oom

Your environment

OS

ubuntu 22.04

OneDiff git commit id

34f98c43d7a146349d7823238e866e2145cd8868

OneFlow version info

Run python -m oneflow --doctor and paste it here. version: 0.9.1.dev20240529+cu122 git_commit: ec7b682 cmake_build_type: Release rdma: True mlir: True enterprise: True

How To Reproduce

Steps to reproduce the behavior(code or script):

import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image, StableDiffusionPipeline,StableDiffusionInpaintPipeline,DPMSolverSinglestepScheduler
from diffusers.utils import load_image, make_image_grid
from onediffx import compile_pipe
from onediff.infer_compiler import oneflow_compile, OneflowCompileOptions

# pipeline = AutoPipelineForInpainting.from_pretrained(
#     "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
# )

pipeline = StableDiffusionInpaintPipeline.from_single_file("/data/models/checkpoint/sd-v1-5-inpainting.safetensors",)
# pipeline.scheduler = DPMSolverSinglestepScheduler(use_karras_sigmas = True)

pipeline = pipeline.to("cuda")
# pipeline = compile_pipe(pipeline, ignores=("vae.encoder", "vae.decoder"))
pipeline.unet = oneflow_compile(pipeline.unet)

# pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
# pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

for i in range(4):
    print("i", i)
    prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
    image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps = 25).images[0]

# resize image to 1024x1024 for SDXL
image_inpainting = image_inpainting.resize((1024, 1024))
image_inpainting.save("./test.jpg")

The complete error message

W20240529 17:16:06.817147 196748 cuda_stream.cpp:49] Runtime version 12.1 of cuBLAS incompatible with compiletime version 12.2. W20240529 17:16:06.818517 196748 cuda_stream.cpp:49] Runtime version 8.5 of cuDNN incompatible with compiletime version 8.9. F20240529 17:16:12.009701 196601 conv2d_tuning_warmup_pass.cpp:167] Check failed: cudaMalloc(&workspace, workspace_size) : out of memory (2) Check failure stack trace: @ 0x7d20f32a6fba google::LogMessage::Fail() @ 0x7d20f32a9ef1 google::LogMessage::SendToLog() @ 0x7d20f32a6ae9 google::LogMessage::Flush() @ 0x7d20f32aa7d9 google::LogMessageFatal::~LogMessageFatal() @ 0x7d20ea088d8f _ZZNK7oneflow12_GLOBAL__N_122Conv2dTuningWarmupPass5ApplyEPNS_3JobEPNS_10JobPassCtxEENKUlPKNS_6OpNodeEE1clES8 @ 0x7d20ea08a2ae oneflow::(anonymous namespace)::Conv2dTuningWarmupPass::Apply() @ 0x7d20e9eb8454 _ZZN7oneflow23LazyJobBuildAndInferCtx8CompleteEvENKUlRKSsiE2_clES2_i @ 0x7d20e9ebdb7d oneflow::LazyJobBuildAndInferCtx::Complete() @ 0x7d22136422b6 oneflow::CurJobBuildAndInferCtx_Complete() @ 0x7d221364310b (unknown) @ 0x7d2213397c88 (unknown)

Additional context

Add any other context about the problem here.

clackhan commented 4 months ago

@Someant Can you tell me the GPU model in your environment? In my env (NVIDIA A100-PCIE-40GB)，During the execution of the above code, the maximum gpu memory consumption is 6GB, and no OOM occurs.

Someant commented 4 months ago

@Someant Can you tell me the GPU model in your environment? In my env (NVIDIA A100-PCIE-40GB)，During the execution of the above code, the maximum gpu memory consumption is 6GB, and no OOM occurs.

RTX4090，CUDA Version: 12.2，I upgraded the driver to 12.5 yesterday and found it is normal. But I found a new problem, the performance improvement is only 20% pytorch：9.74it/s onediff: 11.4it/s

clackhan commented 4 months ago

@Someant Can you tell me the GPU model in your environment? In my env (NVIDIA A100-PCIE-40GB)，During the execution of the above code, the maximum gpu memory consumption is 6GB, and no OOM occurs.

RTX4090，CUDA Version: 12.2，I upgraded the driver to 12.5 yesterday and found it is normal. But I found a new problem, the performance improvement is only 20% pytorch：9.74it/s onediff: 11.4it/s

This conflicts with what I'm running. In my env (Same as RTX4090)，CUDA Version: 12.4 pytorch：37.84it/s onediff: 85.33it/s

onediff commit id: https://github.com/siliconflow/onediff/commit/4b72f723ed7c0c028ae86df8abe66adfcb95a086

oneflow version:

python -m oneflow --doctor
path: ['/home/hanbinbin/anaconda3/envs/py10/lib/python3.10/site-packages/oneflow']
version: 0.9.1.dev20240530+cu122
git_commit: ec7b682
cmake_build_type: Release
rdma: True
mlir: True
enterprise: True

strint commented 2 months ago

@Someant Anything new?

strint commented 2 months ago

@Someant It's too old to follow, please feel free to reopen it.

strint commented 2 months ago

Try this: https://github.com/siliconflow/onediff/issues/848#issuecomment-2100047744

siliconflow / onediff