Firstly, thank you for your great work. I have a question about calculating FVD score.
(1) In the Appendix: B.2. Metrics section of PVDM paper, it is mentioned, "We sample 2,048 samples (or the size of the real data if it is smaller) for calculating real statistics and 2,048 samples for evaluating fake statistics."
For the SKY-Timelapse dataset, it was noted that there are only 196 real samples available, so real statistics were calculated using these 196 samples. However, if we assume that 2,048 fake samples are generated as mentioned, there will be a difference in the number of real and fake samples. Is there any potential issue when comparing statistics between real and fake samples due to this difference in quantity?
as mentioned in Appendix, we use train split for calculating the real statistics, following the protocol used in StyleGAN-V. 196 real samples is the case of the test split.
Firstly, thank you for your great work. I have a question about calculating FVD score.
(1) In the Appendix: B.2. Metrics section of PVDM paper, it is mentioned, "We sample 2,048 samples (or the size of the real data if it is smaller) for calculating real statistics and 2,048 samples for evaluating fake statistics."
For the SKY-Timelapse dataset, it was noted that there are only 196 real samples available, so real statistics were calculated using these 196 samples. However, if we assume that 2,048 fake samples are generated as mentioned, there will be a difference in the number of real and fake samples. Is there any potential issue when comparing statistics between real and fake samples due to this difference in quantity?