horseee / DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
https://horseee.github.io/Diffusion_DeepCache/
Apache License 2.0
687 stars 32 forks source link

When i enable Deep Cache my image start being generated as siginficantly more distorted #27

Open DimkaTsv opened 4 months ago

DimkaTsv commented 4 months ago

I know i am 7800XT user and through newly and only kinda working ZLUDA at that. So i may be asking too much... But i have question? Bug? Request? Depends on how it should be looked at.

When i enable Deep Cache (even with 2 step level. By "steps" i will mean "deep-cache cache interval" further below), my images begin to be generated with severe flaws which scale upwards pretty rapidly with increase in deep-cache frequency stepping. I guess deep-cache somehow make inherent model flaws scale up very rapidly until they cannot be contained?

I used SD 2.1 model and in example there will be both 512x512 generation and 768x768 on respective models. Prompt used everywhere is simple, but practical: "Tree in the forest". Used default sampler with 20 steps.

Decently large negative prompt does kinda help, by reducing generation distortion to great extend. But it only works up to specific deep-cache intervals.

Negative prompt (when used) was take from my other experiment [that is why mountain peaks and road are there... Also fish eye, as i used "wide angle"]: "lowres, low quality, low details, overexposed, overcontrast, oversaturated, text, watermark, excessively grainy, deformed, tiling, deformed foliage, inaccurate sky, deformed trees, deformed leaves, bad reflections, (unrealistic:1.1), smeary, deformed mountain peaks, road, oil drawing, cropped textures, flat textures, fish eye"

At 512x512 model it starts with oversaturation even at default 3 steps. Then reduction in details and mosaic feel, then at about 5-th deep-cache level generation results turns into "game of light, shadow and color". Negative prompt works to about 3-rd step, then generation quickly begin to degrade.

512x512 = no deep-cache image

512x512 = 3 steps - oversaturation image

512x512 = 5 steps - light, shadow and color "game" image

512x512 = 3 steps + negative prompt image

With 768x768 model even originally some pictures look like painting, and leaves are often feel like spots or mosaic. But at 2 steps ALL generated by prompt images tend to look like oil painting or with mosaic feel. While at 5 steps images start to look like as if they were fractured and shuffled a bit. Negative prompt only helps up to 2-nd step. Then it starts to quickly degrade.

768x768 = no deep cache No_deep_cache_768 no deep-cache

768x768 = 3 steps - "oil painting" + mosaic feel Bugged_deep_cache_2

768x768 = 5 steps - "fractured" and/or "mosaic" images Bugged_deep_cache_768_5step

768x768 = 10 steps - basically caleidoscope. Kinda. Looks pretty unique though, ngl. Bugged_deep_cache_768_10step

768x768 = 2 steps + negative prompt Negative_prompt_deep_cache_2step_768

768x768 = 3 steps + negative prompt Negative_prompt_deep_cache_3step_768

I can kinda see slight mosaic feel or spottiness on leaves at 768x768 model. It seems to be inherent proble. But abstract "fracturing" or such level of "spottiness". I also cannot understand why i negative prompt compensated deep cache to some extend, and should it even be like this?

Sorry for bothering with my blabbering and complaint. Writing this issue also wasn't easy. Every time i changed model or deep cache related parameters i was forced to completely restart SD webui. I wish deep-cache wasn't useful for me, but it really allows to noticeable speed-up generation. But at current state for me, i just cannot afford to use it, it seems. It was only reason why i, in the end, decided to write it up.

horseee commented 4 months ago

Hi @DimkaTsv,

Due to the mechanism of DeepCache, using large intervals in reasoning with fewer steps may result in image distortion. We are still in the process of improving the algorithm for this scenario.

DimkaTsv commented 4 months ago

Hi @DimkaTsv,

Due to the mechanism of DeepCache, using large intervals in reasoning with fewer steps may result in image distortion. We are still in the process of improving the algorithm for this scenario.

Thanks for quick response. Seems like my guess was correct.

But was it supposed to kick in that fast (at cache interval of 3?). Or it works better on real CUDA, and issue is coming up because i use ZLUDA and something isn't translating properly. There are like repeated 2 commands that can be translated properly due to HIP (or something like that) and i regularly get warned about them. But they appear basically everywhere independent of action basically. So i am not sure.

:1:C:\constructicon\builds\gfx\two\23.30\drivers\compute\clr\hipamd\src\hip_code_object.cpp:616 : 120376714666 us: [pid:25004 tid:0x5c48] Cannot find the function: Cijk_Ailk_Bljk_BBS_BH_MT64x64x32_MI16x16x16x1_SN_1LDSB0_APM1_AF0EM1_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTL0_EPS1_FL0_GRVW4_GSU1_GSUASB_ISA1101_IU1_K1_KLA_LBSPP128_LPA0_LPB8_LDL1_LRVW16_MIAV1_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_RK0_SIA1_SS1_SU0_SUM0_SUS0_SVW2_SNLL0_TT2_32_TLDS1_UMLDSA0_UMLDSB1_USFGROn1_VAW1_VSn1_VW2_WSGRA1_WSGRB1_WS32_WG32_4_1_WGM8
:1:C:\constructicon\builds\gfx\two\23.30\drivers\compute\clr\hipamd\src\hip_module.cpp:83  : 120376714750 us: [pid:25004 tid:0x5c48] Cannot find the function: Cijk_Ailk_Bljk_BBS_BH_MT64x64x32_MI16x16x16x1_SN_1LDSB0_APM1_AF0EM1_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTL0_EPS1_FL0_GRVW4_GSU1_GSUASB_ISA1101_IU1_K1_KLA_LBSPP128_LPA0_LPB8_LDL1_LRVW16_MIAV1_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_RK0_SIA1_SS1_SU0_SUM0_SUS0_SVW2_SNLL0_TT2_32_TLDS1_UMLDSA0_UMLDSB1_USFGROn1_VAW1_VSn1_VW2_WSGRA1_WSGRB1_WS32_WG32_4_1_WGM8 for module:

Maybe issue is caused by this? If there is debug environment variable for deep-cache, i can try to do debug, if it will help.

I won't make a request to fix something because of ZLUDA or HIP, if it is hard or impossible to do on that side, though. Because i know that translation level quirks should be treated by translation level developers, if possible.