megvii-research / CADDM

Official implementation of ID-unaware Deepfake Detection Model
Apache License 2.0
135 stars 19 forks source link

Some questions on "Implicit Identity Leakage" #3

Closed LOOKCC closed 1 year ago

LOOKCC commented 1 year ago

Thank you for the new insight in deepfake detection! I have some questions for fully understanding the "Implicit Identity Leakage". The FF++ real dataset has 1000 videos, and the author split them into 720, 140, 140 for train, val and test. Taking the training set as an example, the fake videos used for training are generated by forging each other within the training set. Therefore, on FF++, the face identity information of the training set and the test set do not overlap. So if the network has learned a lot of features related to face identity, the in-dataset test AUC on FF++ dataset will also lead to a decrease because these face identities have not been seen during training.

Nku-cs-dsc commented 1 year ago

"Implicit Identity Leakage" means that the binary classifier captures identity information during the training phase. But it does not mean that the classifier makes judgments only based on such identity information, it also utilizes the specific artifact feature learned from specific manipulation methods. During the in-dataset evaluation, the testing set is generated by the same manipulation method and contains the same specific artifact as the training set. The classifier achieves high performance by indicating such artifacts. However, when testing the model on cross-dataset evaluation, The classifier can not detect novel artifact features in the new testing set, it misuses the identity information to make false judgments.

For more details please see Section 3 and Section 5 of the paper "Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization".