Open sonwe1e opened 10 months ago
My point is that if time t is a key moment, and t+1, t+2, t+3 are non-key, this means that the decoders for t+1, t+2, t+3 all use the features f_t from time t. According to the parallel steps in the paper, t+1, t+2, t+3 all need to decode f_t, but these time steps do not utilize the encoder. So, what is the purpose of the results obtained from this decoding?
I hope I have made my question clear, Thanks
My point is that if time t is a key moment, and t+1, t+2, t+3 are non-key, this means that the decoders for t+1, t+2, t+3 all use the features f_t from time t. According to the parallel steps in the paper, t+1, t+2, t+3 all need to decode f_t, but these time steps do not utilize the encoder. So, what is the purpose of the results obtained from this decoding?
I hope I have made my question clear, Thanks
Even though the encoder of UNet is not used during non-key timesteps, its decoder receives shared encoder features from key timesteps, then outputs the predicted noise $\epsilon$, to updates $z_t$. I hope I understand your question correctly.
Thank you for your answer, it has nicely resolved my doubts. I made a silly mistake.
Thank you again for your response. I have another question. From the graph, it seems that a smaller interval in the Uniform method means fewer skipped encoders, which should mean it's closer to the original diffusion process. But why then is the performance of I worse than that of II
Great work on the study, but I have some queries I'd like to ask.
If the time-steps considered as non-key directly skip the encoding step of the encoder, how are the images decoded from the features encoded by the key time-step encoder used in these non-key time-steps? Since the encoders at non-key time-steps are skipped, there wouldn't be any encoding at time t+1 either. Why not skip the non-key phases altogether?