Regarding reproduction for given quantitative results on ImageNet 256x256

Hi, thank you for your great work and sharing the code and checkpoint.

I have two questions regarding the quantitative scores presented in the paper:

I attempted to reproduce the scores for H1P5R3 and H8P9R5 from Table 2. I was able to achieve a similar FID score for H1P5R3 using the provided checkpoint. However, when testing H8P9R5, I observed a discrepancy between my evaluation score (2.33) and the reported score in the paper (2.12). To further investigate, I switched from fp16 to fp32, obtaining an FID of 2.27, which still differs from the original score. I also by myself trained the model for 30k iterations but ended up with an FID of 2.29. Could you kindly provide any insights into why my FID score for H8P9R5 is higher? I wonder if the 2.12 FID score might correspond to H7P10R4, as indicated in Table 4?
Additionally, could you offer some insights into why the FlowTurbo model outperforms SiT? My understanding is that the velocity refiner predicts the offset for real velocity from the SiT output, which leads me to believe that SiT should theoretically serve as an upper bound for FlowTurbo's performance.

I hope you can give me some insights on above questions. Thank you in advance.

Hi, thank you for your great work and sharing the code and checkpoint.

I have two questions regarding the quantitative scores presented in the paper:

I attempted to reproduce the scores for H1P5R3 and H8P9R5 from Table 2. I was able to achieve a similar FID score for H1P5R3 using the provided checkpoint. However, when testing H8P9R5, I observed a discrepancy between my evaluation score (2.33) and the reported score in the paper (2.12). To further investigate, I switched from fp16 to fp32, obtaining an FID of 2.27, which still differs from the original score. I also by myself trained the model for 30k iterations but ended up with an FID of 2.29. Could you kindly provide any insights into why my FID score for H8P9R5 is higher? I wonder if the 2.12 FID score might correspond to H7P10R4, as indicated in Table 4?

Additionally, could you offer some insights into why the FlowTurbo model outperforms SiT? My understanding is that the velocity refiner predicts the offset for real velocity from the SiT output, which leads me to believe that SiT should theoretically serve as an upper bound for FlowTurbo's performance.

I hope you can give me some insights on above questions. Thank you in advance.

Thank you very much for your interest in our work. Regarding the first question: The differences between your experimental results and ours can be attributed to the following reasons:

If you are using multiple GPUs to extract features for calculating the Fréchet Inception Distance (FID), ensure that each process uses a different seed to maintain inference diversity.
The selection of different seeds may result in slight variations in the FID score. I have conducted additional experiments. I am glad to share the features and results I obtained to help you better understand the differences arising from different seeds. https://cloud.tsinghua.edu.cn/d/545a62186bd24e2f8a64/ [Additional Experiments Config]
nproc_per_node=5 (5 NVIDIA GeForce RTX 3090 GPUs)
seed = args.fix_seed + rank (args.fix_seed \in [0,1,2,3,4,5,6,7,8])
FlowTurbo: H8P9R5
FID in sequence: [2.19, 2.16, 2.13, 2.16, 2.13, 2.12, 2.12, 2.14, 2.12]

Regarding the second question: Our insights suggest that flow-based models like SiT may exhibit information redundancy, particularly in model weights. This observation motivated us to design a lightweight refiner to regress the offset. Consequently, the trajectories generated by the model may differ when using the refiner compared to the original model. Moreover, the original SiT model requires significantly more time to achieve similar generative quality. Our approach reduces this computational overhead, highlighting the information redundancy inherent in flow-based models.

shiml20 / FlowTurbo

Regarding reproduction for given quantitative results on ImageNet 256x256 #4