siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.4k stars 85 forks source link

Add dynamic shape demo for diffusers sd3 #953

Closed lixiang007666 closed 1 week ago

lixiang007666 commented 2 weeks ago

This PR is done:

Env: torch==2.3.0

Test log:

compile...
Starting warmup...
Warmup complete.
Generated image saved to sd3_compile.png in 0.53 seconds.
Max used CUDA memory : 16.999GiB
Test run with multiple resolutions...
Running at resolution: 1536x1536
Inference time: 4.49 seconds
Running at resolution: 1536x1024
Inference time: 2.60 seconds
Running at resolution: 1536x768
Inference time: 1.79 seconds
Running at resolution: 1536x720
Inference time: 1.69 seconds
Running at resolution: 1536x576
Inference time: 1.31 seconds
Running at resolution: 1536x512
Inference time: 1.15 seconds
Running at resolution: 1536x256
Inference time: 0.65 seconds
Running at resolution: 1024x1536
Inference time: 2.62 seconds
Running at resolution: 1024x1024
Inference time: 1.59 seconds
Running at resolution: 1024x768
Inference time: 1.12 seconds
Running at resolution: 1024x720
Inference time: 1.06 seconds
Running at resolution: 1024x576
Inference time: 0.84 seconds
Running at resolution: 1024x512
Inference time: 0.79 seconds
Running at resolution: 1024x256
Inference time: 0.52 seconds
Running at resolution: 768x1536
Inference time: 1.82 seconds
Running at resolution: 768x1024
Inference time: 1.14 seconds
Running at resolution: 768x768
Inference time: 0.87 seconds
Running at resolution: 768x720
Inference time: 0.82 seconds
Running at resolution: 768x576
Inference time: 0.66 seconds
Running at resolution: 768x512
Inference time: 0.64 seconds
Running at resolution: 768x256
Inference time: 0.50 seconds
Running at resolution: 720x1536
Inference time: 1.67 seconds
Running at resolution: 720x1024
Inference time: 1.07 seconds
Running at resolution: 720x768
Inference time: 0.81 seconds
Running at resolution: 720x720
Inference time: 0.84 seconds
Running at resolution: 720x576
Inference time: 0.65 seconds
Running at resolution: 720x512
Inference time: 0.61 seconds
Running at resolution: 720x256
Inference time: 0.50 seconds
Running at resolution: 576x1536
Inference time: 1.30 seconds
Running at resolution: 576x1024
Inference time: 0.88 seconds
Running at resolution: 576x768
Inference time: 0.66 seconds
Running at resolution: 576x720
Inference time: 0.64 seconds
Running at resolution: 576x576
Inference time: 0.57 seconds
Running at resolution: 576x512
Inference time: 0.50 seconds
Running at resolution: 576x256
Inference time: 0.48 seconds
Running at resolution: 512x1536
Inference time: 1.16 seconds
Running at resolution: 512x1024
Inference time: 0.79 seconds
Running at resolution: 512x768
Inference time: 0.60 seconds
Running at resolution: 512x720
Inference time: 0.61 seconds
Running at resolution: 512x576
Inference time: 0.50 seconds
Running at resolution: 512x512
Inference time: 0.48 seconds
Running at resolution: 512x256
Inference time: 0.47 seconds
Running at resolution: 256x1536
Inference time: 0.64 seconds
Running at resolution: 256x1024
Inference time: 0.51 seconds
Running at resolution: 256x768
Inference time: 0.51 seconds
Running at resolution: 256x720
Inference time: 0.50 seconds
Running at resolution: 256x576
Inference time: 0.49 seconds
Running at resolution: 256x512
Inference time: 0.48 seconds
Running at resolution: 256x256
Inference time: 0.46 seconds
lixiang007666 commented 2 weeks ago

diffusers 会对输入 prompt 做截断(限制 77 tokens):https://github.com/huggingface/diffusers/blob/8e1b7a084addc4711b8d9be2738441dfad680ce0/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L238

image

这个 PR 里也添加了动态切换 prompt 的测试,发现不会发生 comfy 里触发 recompile 的情况。

strint commented 2 weeks ago

diffusers 会对输入 prompt 做截断(限制 77 tokens):https://github.com/huggingface/diffusers/blob/8e1b7a084addc4711b8d9be2738441dfad680ce0/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L238

image

这个 PR 里也添加了动态切换 prompt 的测试,发现不会发生 comfy 里触发 recompile 的情况。

可以把这块注释到多 prompt 那块