[SD3] Incorrect stochastic sampling implementation

Luciennnnnnn commented 1 month ago

Describe the bug

Ref to Algorithm 2 of EDM, for a given sample $x_t$, noise is introduced to it reaching a higher noise level $\hat{t}$, then we evaluate network with $\hat{x_t}$, $\hat{t}$ as input. However, the current implementation evaluates network with $x_t$, $t$ as input, which is inconsistent from definition.

Current implementation is more similar with Euler-Maruyama in spirit, "One can interpret Euler–Maruyama as first adding noise and then performing an ODE step, not from the intermediate state after noise injection, but assuming that $x$ and $\sigma$ remained at the initial state at the beginning of the iteration step." quote from EDM

Reproduction

no

Logs

No response

System Info

no

Who can help?

@yiyixuxu @sayakpaul

sayakpaul commented 1 month ago

Cc: @kashif

Can you point us to the line of code you're referring to?

Luciennnnnnn commented 1 month ago

@sayakpaul sure, but it is hard to point specific line since it is correlated with whole algorithm. The problem is in step function https://github.com/huggingface/diffusers/blob/a899e42fc78fbd080452ce88d00dbf704d115280/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py#L189

geroldmeisinger commented 1 month ago

All sampler and scheduler permutations for Stable Diffusion 3: https://www.reddit.com/r/StableDiffusion/comments/1dh5n7k

yiyixuxu commented 1 month ago

the default scheduler for SD3 use flow-matching, instead of diffusion process, i.e. there is no noise adding involved in forward process

huggingface / diffusers