Open 285220927 opened 9 months ago
可以使用 oneflow_compiler_config 关闭一下 mlir_enable_inference_optimization 这个会关闭常量折叠,减少一部分显存开销。(之前测试影响 5%速度)
另外,可以使用 oneflow.cuda.empty_cache() 清理 oneflow 的显存池的缓存
diffusers 的话,可以尝试最新加的 compile_pipe ,编译的范围会更全,速度更快
https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions#compile_pipe
可以使用 oneflow_compiler_config 关闭一下 mlir_enable_inference_optimization 这个会关闭常量折叠,减少一部分显存开销。(之前测试影响 5%速度)
另外,可以使用 oneflow.cuda.empty_cache() 清理 oneflow 的显存池的缓存
我修改之后能减少大约700MiB,但是依然比pytorch多很多,这种现象是正常的吗
When using the pipe from diffusers import StableDiffusionControlNetPipeline
and from onediffx import compile_pipe
, or from onediff.infer_compiler import oneflow_compile
in my code, I noticed a significant increase in GPU memory usage from 13GB to 18GB , max is 21GB .
When using the pipe
from diffusers import StableDiffusionControlNetPipeline
andfrom onediffx import compile_pipe
, orfrom onediff.infer_compiler import oneflow_compile
in my code, I noticed a significant increase in GPU memory usage from 13GB to 18GB , max is 21GB .
what about run oneflow.cuda.empty_cache()
Yes, the compiled VAE/ControlNet will take extra memory. We are trying to reduce this cost in the future version, but will not be quick.
@onefish51
It is still useful. Previously, when I used torch.cuda.empty_cache()
, it didn't work, but when I used oneflow.cuda.empty_cache()
, it worked. However, the peak GPU memory usage during runtime remains the same.
and Looking forward to your “the future version”.
@onefish51 @285220927
Please refer to this to compile StableDiffusionXLControlNetInpaintPipeline
with nexfort backend.
This should cost less cuda memory than before.
import argparse
from PIL import Image
import torch
from onediffx import compile_pipe
from diffusers import StableDiffusionImg2ImgPipeline
prompt = "sea,beach,the waves crashed on the sand,blue sky whit white cloud"
def parse_args():
parser = argparse.ArgumentParser(description="Simple demo of image generation.")
parser.add_argument(
"--model_id", type=str, default="stabilityai/stable-diffusion-2-1",
)
cmd_args = parser.parse_args()
return cmd_args
args = parse_args()
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
args.model_id, use_auth_token=True, revision="fp16", torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
options = '{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last"}'
pipe = compile_pipe(pipe, backend="nexfort", options=options, fuse_qkv_projections=True)
img = Image.new("RGB", (512, 512), "#1f80f0")
with flow.autocast("cuda"):
images = pipe(
prompt, image=img, guidance_scale=10, num_inference_steps=100, output_type="np",
).images
for i, image in enumerate(images):
pipe.numpy_to_pil(image)[0].save(f"{prompt}-of-{i}.png")
Describe the bug
A clear and concise description of what the bug is.
测试sdxl controlnet inpaint的时候,onediff显存的占用比pytorch高了接近一半
Your environment
OS
ubuntu 20.04 gpu NVIDIA GeForce RTX 4090 python 3.8 diffusers 0.23.0 onediff 0.12.1.dev202401310124 pytorch 2.0.1 cuda 12.2
OneDiff git commit id
OneFlow version info
Run
python -m oneflow --doctor
and paste it here. version: 0.9.1.dev20240125+cu122 git_commit: 6458a12 cmake_build_type: Release rdma: True mlir: True enterprise: FalseHow To Reproduce
Steps to reproduce the behavior(code or script):
The complete result
pytorch
onediff
Additional context
Add any other context about the problem here.