johannakarras / DreamPose

Official implementation of "DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion"
MIT License
955 stars 74 forks source link

Paper typo: wrong equation in Background section for decoding process #66

Open gh-BumsooKim opened 6 months ago

gh-BumsooKim commented 6 months ago

Thank you for researching excellent image-to-video synthesis work and sharing the code publicly.

I found that wrong equation in not only arXiv paper (https://arxiv.org/pdf/2304.06025.pdf) but also ICCV2023 paper (https://openaccess.thecvf.com/content/ICCV2023/papers/Karras_DreamPose_Fashion_Video_Synthesis_with_Stable_Diffusion_ICCV_2023_paper.pdf).

In Background section, I think this equation (between Eq.(1) and Eq.(2)) might be wrong :

image

I think it should be $x^\prime = \mathcal{D}(z^\prime)$ (x' = D(z')) Because the variable which will be decoded from latent via decoder was encoded by VAE encoder, notating as z'. This equation is same in arXiv version and ICCV publishing version.

I believe that correct equation (but I request you must confirm my new suggestion above) don't give a confusion for other researchers. Thank you.