plai-group / flexible-video-diffusion-modeling

MIT License
113 stars 14 forks source link

Log-uniform scale sampling #6

Closed neverix closed 5 months ago

neverix commented 1 year ago

The paper states that group scales are sampled uniformly when creating training samples.

However, the implementation subtracts 0.999 from n_group before exponentiating. This means that the scale can explode 1000x. Is this intentional?

wsgharvey commented 5 months ago

Hi, thanks for pointing this out and sorry for the slow response.

I have double-checked and the code used in our experiments is the same as in the implementation we link to. It seems that this is a mistake in the algorithm in the paper, and is not intentional.

It looks like the difference is the upper bound on the log-uniform distribution is $\frac{N}{n\text{group} - 0.999}$ in the code and $\frac{N-1}{n\text{group}}$ in the algorithm. These will be quite different when:

I would say that the versions in the code make more sense as they allow for the broader distribution - e.g. when we want to space three points across a distance $N$ then it is possible to space them up to $\frac{N}{2}$ apart, not just $\frac{N}{3}$. Arguably an ideal expression for the upper bound would be $\frac{N}{n\text{group} - 1}$, but I think we edited this to $\frac{N}{n\text{group} - 0.999}$ as a simple way to avoid nans when $n_\text{group} = 1$.

We will do an update of the paper on arxiv in the next few weeks and fix the algorithm there.

Thanks again for pointing this out!