In Background section, I think this equation (between Eq.(1) and Eq.(2)) might be wrong :
I think it should be $x^\prime = \mathcal{D}(z^\prime)$ (x' = D(z'))
Because the variable which will be decoded from latent via decoder was encoded by VAE encoder, notating as z'.
This equation is same in arXiv version and ICCV publishing version.
I believe that correct equation (but I request you must confirm my new suggestion above) don't give a confusion for other researchers. Thank you.
Thank you for researching excellent image-to-video synthesis work and sharing the code publicly.
I found that wrong equation in not only arXiv paper (https://arxiv.org/pdf/2304.06025.pdf) but also ICCV2023 paper (https://openaccess.thecvf.com/content/ICCV2023/papers/Karras_DreamPose_Fashion_Video_Synthesis_with_Stable_Diffusion_ICCV_2023_paper.pdf).
In Background section, I think this equation (between Eq.(1) and Eq.(2)) might be wrong :
I think it should be $x^\prime = \mathcal{D}(z^\prime)$ (x' = D(z')) Because the variable which will be decoded from latent via decoder was encoded by VAE encoder, notating as z'. This equation is same in arXiv version and ICCV publishing version.
I believe that correct equation (but I request you must confirm my new suggestion above) don't give a confusion for other researchers. Thank you.