mjkwon2021 / CAT-Net

Official code for CAT-Net: Compression Artifact Tracing Network. Image manipulation detection and localization.
208 stars 25 forks source link

Transfer learning on a large dataset #50

Open szjyoo opened 1 month ago

szjyoo commented 1 month ago

Hello author. I tried to train CAT-Net on the DocTamper dataset (120000 images). I look forward to your answer as to whether I should change self.smallest = 1869 to self.smallest = 120000 in the data_core.py, or should I train with a subset of the full dataset in each round.

CauchyComplete commented 1 month ago

Hello :)

If you are adding the new DocTamper dataset (120k images) to the existing dataset setup, the smallest dataset is still IMD, so self.smallest should be 1869 (the number of images in IMD). If you are using only the DocTamper dataset without any other datasets, then it would be correct to set self.smallest to 120k. However, this would mean that 120k images are used in one epoch, which would take too long. Since the original training method of CAT-Net uses 1869*10 images per epoch, it might be a good idea to set self.smallest to 1869*10.

szjyoo commented 1 month ago

Thank you very much for your answer. I'm only using DocTamper as a dataset. My validation set and testing set are 10,000 and 30,000 images respectively, considering the training efficiency and training performance, i want to kown whether I set self.smallest to 10,000 or 1869*10 will get better results.Looking forward to your answer.

1513691610 commented 1 month ago

Thank you very much for your answer. I'm only using DocTamper as a dataset. My validation set and testing set are 10,000 and 30,000 images respectively, considering the training efficiency and training performance, i want to kown whether I set self.smallest to 10,000 or 1869*10 will get better results.Looking forward to your answer.

Hello, I am also training Catnet with Doctamper. Can you leave me a contact information to discuss together? Thank you