scu-zjz / IMDLBenCo

[NeurIPS'24 Spotlight] A comprehensive benchmark & codebase for Image manipulation detection/localization.
https://scu-zjz.github.io/IMDLBenCo-doc
Creative Commons Attribution 4.0 International
52 stars 8 forks source link

About input size #21

Open chenhanch opened 3 months ago

chenhanch commented 3 months ago

This is a great job and has a positive significance for the development of the IML community.

However, I have noticed that there is some confusion regarding the input image sizes for different methods in this project. For instance, in Trufor, the input image is randomly cropped to 512 × 512 during training, while the entire image, regardless of size, is used during testing. This is why the experimental results in Table 3 are different from those in the original paper.

Additionally, it seems that there is no standardization in current research on image manipulation detection regarding whether input images should be resized and what size they should be resized to. As a result, most comparisons are unfair, and the experimental conclusions are not reliable. This is a matter that deserves serious attention.

SunnyHaze commented 3 months ago

Hi! Han Chen!

Thank you for recognizing and engaging with our work!


However, I have noticed that there is some confusion regarding the input image sizes for different methods in this project. For instance, in Trufor, the input image is randomly cropped to 512 × 512 during training, while the entire image, regardless of size, is used during testing. This is why the experimental results in Table 3 are different from those in the original paper.

This is a good question. In our experiments with TruFor, we indeed resized images to 512 × 512 for both training and testing, which differs from the original paper. This approach was chosen for computational efficiency since TruFor, although supporting multi-resolution input, experiences a significant increase in FLOPs with larger image sizes.

Method Paper Infer. Time (sec)/ Image Params.(M) 512x512 FLOPS(G) 1024x1024 FLOPS(G)
MVSS-Net ICCV21 & TPAMI22 2.929 147 167 683
PSCC-Net TCSVT22 0.072 3 120 416
HiFi-Net CVPR23 1.512 7 404 3470
TruFor CVPR23 1.231 68 231 1016
IML-ViT ArXiv 0.094 91 136 576

Additionally, it seems that there is no standardization in current research on image manipulation detection regarding whether input images should be resized and what size they should be resized to. As a result, most comparisons are unfair, and the experimental conclusions are not reliable. This is a matter that deserves serious attention.

Regarding your second point, the lack of standardization in image resizing is indeed a chaotic aspect of the current research landscape. Highlighting and addressing this issue is one of the primary reasons we wrote this paper. By explicitly stating the resolutions used in different studies, we hope to bring attention to this problem and help the field achieve more reliable and consistent conclusions.

Knightzjz commented 3 months ago

Thanks for your information Han Chen. We recently also received some similar questions and advice. Please refer to Issue #22 for our detailed explanations.

chchshshhh commented 2 weeks ago

Thank you very much for your outstanding contributions in the field of image manipulation detection. Regarding the input image size, would you consider using a unified experimental setup across all models by resizing the images to 512×512 for the experiments? I believe this would be the fairest way to make comparisons.

SunnyHaze commented 2 weeks ago

Thank you very much for your outstanding contributions in the field of image manipulation detection. Regarding the input image size, would you consider using a unified experimental setup across all models by resizing the images to 512×512 for the experiments? I believe this would be the fairest way to make comparisons.

Thanks for your attention to our project. However, it's true that each network has its own appropriate resolution and design, and the corresponding complexity is also within the same order of magnitude. Therefore, our strategy is to 'stay true to the original work,' implementing the models as closely as possible to the design presented in the original paper.