Open chenhanch opened 3 months ago
Hi! Han Chen!
Thank you for recognizing and engaging with our work!
However, I have noticed that there is some confusion regarding the input image sizes for different methods in this project. For instance, in Trufor, the input image is randomly cropped to 512 × 512 during training, while the entire image, regardless of size, is used during testing. This is why the experimental results in Table 3 are different from those in the original paper.
This is a good question. In our experiments with TruFor, we indeed resized images to 512 × 512 for both training and testing, which differs from the original paper. This approach was chosen for computational efficiency since TruFor, although supporting multi-resolution input, experiences a significant increase in FLOPs with larger image sizes.
Method | Paper | Infer. Time (sec)/ Image | Params.(M) | 512x512 FLOPS(G) | 1024x1024 FLOPS(G) |
---|---|---|---|---|---|
MVSS-Net | ICCV21 & TPAMI22 | 2.929 | 147 | 167 | 683 |
PSCC-Net | TCSVT22 | 0.072 | 3 | 120 | 416 |
HiFi-Net | CVPR23 | 1.512 | 7 | 404 | 3470 |
TruFor | CVPR23 | 1.231 | 68 | 231 | 1016 |
IML-ViT | ArXiv | 0.094 | 91 | 136 | 576 |
Additionally, it seems that there is no standardization in current research on image manipulation detection regarding whether input images should be resized and what size they should be resized to. As a result, most comparisons are unfair, and the experimental conclusions are not reliable. This is a matter that deserves serious attention.
Regarding your second point, the lack of standardization in image resizing is indeed a chaotic aspect of the current research landscape. Highlighting and addressing this issue is one of the primary reasons we wrote this paper. By explicitly stating the resolutions used in different studies, we hope to bring attention to this problem and help the field achieve more reliable and consistent conclusions.
Thanks for your information Han Chen. We recently also received some similar questions and advice. Please refer to Issue #22 for our detailed explanations.
Thank you very much for your outstanding contributions in the field of image manipulation detection. Regarding the input image size, would you consider using a unified experimental setup across all models by resizing the images to 512×512 for the experiments? I believe this would be the fairest way to make comparisons.
Thank you very much for your outstanding contributions in the field of image manipulation detection. Regarding the input image size, would you consider using a unified experimental setup across all models by resizing the images to 512×512 for the experiments? I believe this would be the fairest way to make comparisons.
Thanks for your attention to our project. However, it's true that each network has its own appropriate resolution and design, and the corresponding complexity is also within the same order of magnitude. Therefore, our strategy is to 'stay true to the original work,' implementing the models as closely as possible to the design presented in the original paper.
This is a great job and has a positive significance for the development of the IML community.
However, I have noticed that there is some confusion regarding the input image sizes for different methods in this project. For instance, in Trufor, the input image is randomly cropped to 512 × 512 during training, while the entire image, regardless of size, is used during testing. This is why the experimental results in Table 3 are different from those in the original paper.
Additionally, it seems that there is no standardization in current research on image manipulation detection regarding whether input images should be resized and what size they should be resized to. As a result, most comparisons are unfair, and the experimental conclusions are not reliable. This is a matter that deserves serious attention.