About the experimental setup

The code used to compute the metrics is reported in test_docker/metrics.py. We compute F1 using both the heatmap and the inverted heatmap and take the maximum. Note that in this code the ground truth has to be 0 for real pixels and 1 for fake pixels (careful: DSO-1 has inverted ground truths)
We did not change the training set based on the test set and we did not fine-tune on their training split. As for CASIA, CASIA v2 is in training and CASIA v1+ is in test set. CASIA v1+ is a version of CASIA v1 made by MVSS-Net++ where real images are drawn from COREL dataset, so that there is no overlapping with CASIA v2 (seen in training). Any other test dataset is evaluated on 100% of it, since we did not use a split for fine-tuning (except for OpenForensics and NIST, where we use a subset for computational constraints)
The number of test images for each dataset is reported in table 1 of the supplemental or table 8 of the ArXiv.

I added the lists of the images used in test in the folder test_docker/data_test/

grip-unina / TruFor