Have you achieved the similar results to the reported ones in the original paper?

yzhang2016 commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

neverUseThisName commented 4 years ago

No, I did not dive deep into this.

jerry4h commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

yzhang2016 commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

jerry4h commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

yzhang2016 commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

For data augmentation, adding random noise and blurring are used.

yzhang2016 commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

Did you align the results of your implementation and the reported results in the paper?

wshenx commented 4 years ago

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

yzhang2016 commented 4 years ago

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

I don't think this is caused by not using the raw videos. In most works of face forgery detection, high-quality videos (c23) are used for training. In my case, the distribution of the generated BI database seems closer to those of DF and FS than those of F2F and NT.

wshenx commented 4 years ago

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

I don't think this is caused by not using the raw videos. In most works of face forgery detection, high-quality videos (c23) are used for training. In my case, the distribution of the generated BI database seems closer to those of DF and FS than those of F2F and NT.

Thank you for your reply. I notice that in the limitation section of the paper the authors say "We test our framework on the HQ version (a light compression) and the LQ version (a heavy compression) of FF++ dataset and the overall AUC are 87.35% and 61.6% respectively." It seems that resolution matters. Nevertheless, I'll keep on trying on c23 images.

skJack commented 4 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

Could you tell me what is your acc on the c23 deepfake and face2face?

LoveSiameseCat commented 3 years ago

Hi,@yzhang2016 . When I reimplemented face xray, I also encountered overfitting problem to the generated data, which is the similar situation as @jerry4h. I trained the BI dataset, the model only catched the blending fingerprint in BI evaluation set, but failed to detect the blending boundaries to the deepfake in FF++_c23 dataset. I checked some examples in my BI dataset, the generated fake face is hard for me to distinguish. So I think the generated data is ok, the reason of poor generation may be that the blending operation is far away from the synthetic process in FF++. Can you tell me your detail parameter in your experiment? I also notice you used random noise and blurring to augment data，whether this operation was implemented on the foreground face or on the whole generated image? Hope for your reply, thank you.

AugustasMacys commented 3 years ago

@yzhang2016 , @ChineseboyLuo, @jerry4h Hi guys, do you mind sharing your neural network architecture? specifically init and forward functions for Neural Network architecture? (It was Called NNb in the paper). It would be very appreciated and helpful.

byx-123 commented 3 years ago

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation? Hello, May I ask your some questions about Face X-ray? I've followed your github id,can you tell me your email?

gleonato commented 3 years ago

Hi folks! Have you had the chance to overcome the generalization problem with results similar to the original paper ?

AugustasMMatches commented 3 years ago

Hi,

@gleonato

I did not manage to get similar results with Deepfake Detection Chalenge. However, what I noticed was that generating Deepfakes for this paper is very important, if your generated Deepfakes will differ from the test data, it will not generalize.

neverUseThisName / Face-X-Ray

Have you achieved the similar results to the reported ones in the original paper? #4