qcf-568 / DocTamper

[CVPR2023] Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
127 stars 11 forks source link

For evaluation on other datasets #72

Closed kasteric closed 2 months ago

kasteric commented 2 months ago

Hi, I have several questions about your idea of this work, as well as evaluation on custom datasets: 1) Do you agree that "due to compression, the contrast between ps and non-ps region has been amplified, which makes the model better recognize the ps region" or "since many real images are actually compressed, so we need to better understand what the behaviour of ps images after image compression" 2) I notice that on DocTamper, the quant tables have been provided and fixed. How about real-world images, if they are not compressed initially, do you think it is necessary to apply the compression augmentation?

qcf-568 commented 2 months ago

Thank you for your attention to our work, here are the answers:

  1. I completely disagree with the first statement. Compression can only weaken the contrast between ps and non-ps, which makes it harder for the model to recognize the ps region. I agree with the second statement, a model that is robust to image compression is important.
  2. Compression augmentation is only necessary during training, but detrimental during testing. You can read the quant tables directly with jpegio or PIL, or just read the file head from the image meta for real-world images.
kasteric commented 2 months ago

You have enlightened me clearly and straightforwardly, many thanks!