siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.67k stars 100 forks source link

Dynamic Resolution Compilation #94

Closed chavinlo closed 1 year ago

chavinlo commented 1 year ago

Hello, is it possible to compile the model for dynamic resolution generation rather than static? Similar to TensorRT's? I see in the code that the compilation call is made either if the model hasn't been compiled already OR the request is for a different resolution than the already compilated one.

strint commented 1 year ago

Compiling a dynamic shape graph is not supported right now. We use some optimization skills which are related to static input shape:

Have you met some problems with static shapes?

chavinlo commented 1 year ago

Have you met some problems with static shapes?

No, but we are building a service that plans to offer dynamic resolutions. I was thinking of compiling it for multiple resolutions, but that would take up vram space

chavinlo commented 1 year ago

Please let me know if this feature is ever implemented. Thanks

strint commented 1 year ago

Please let me know if this feature is ever implemented. Thanks

We provide an offline-compile mode to reduce online-compile time for scenarios where the input shapes of online inference is limited. Hope this will help: https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion#optimization-for-multi-resolution-picture

@chavinlo

chavinlo commented 1 year ago

Please let me know if this feature is ever implemented. Thanks

We provide an offline-compile mode to reduce online-compile time for scenarios where the input shapes of online inference is limited. Hope this will help: https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion#optimization-for-multi-resolution-picture

@chavinlo

Thanks. First I call pipe.enable_graph_share_mem(), then run inference on the resolutions I want, and the graph should be ready for those resolutions right?

pipe.enable_graph_share_mem()

prompt = "a photo of an astronaut riding a horse on mars, red sky, (green sky:1.5)"
with torch.autocast("cuda"):
    images = pipe(prompt, height=1024).images
    images = pipe(prompt, height=768).images
    images = pipe(prompt, height=512).images
    images = pipe(prompt, height=256).images

sorting the input shape from large to small to trigger graph compilation

I see that it takes 1 second to load when it changes resolution:

    images = pipe(prompt, height=256).images
    images = pipe(prompt, height=256).images
    images = pipe(prompt, height=512).images
    images = pipe(prompt, height=512).images
    images = pipe(prompt, height=768).images
    images = pipe(prompt, height=768).images
    images = pipe(prompt, height=1024).images
    images = pipe(prompt, height=1024).images
    images = pipe(prompt, height=768).images
    images = pipe(prompt, height=768).images
    images = pipe(prompt, height=512).images
    images = pipe(prompt, height=512).images
    images = pipe(prompt, height=256).images
    images = pipe(prompt, height=256).images

Is there anyway to speed this up? or maybe mantain all of them active? I don't mind having to use more vram.

strint commented 1 year ago

You can run and read this test to be familiar with these two features: https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py

@chavinlo