nii-yamagishilab / Capsule-Forensics-v2

Implementation of the Capsule-Forensics-v2
BSD 3-Clause "New" or "Revised" License
116 stars 23 forks source link

Confusion about dataset split #13

Closed wasim004 closed 1 year ago

wasim004 commented 2 years ago

Hi, Can you please explain Table.1? I mean what does 720*3 vids, 140*3 vid and 140*3 represents for all train, val and test datasets? I am unable to understand the video split? Secondly, can you please explain how you dealt with images data and videos data? How you placed them in folders to prepare dataloader? Also, let me confirm about whether you extract frames from videos and make a videos again from those extracted frames of videos and used test_binary_vid_ffpp.py OR test_multiclass_vid_ffpp.py to evaluate the model on test videos? Thanks!

honghuy127 commented 2 years ago

Hi,

The train/val/test set split annotations are from here. The number 3 means we used 3 compression levels: c0, c23, and c40 for each video.

All of our experiments used video frames extracted by these scripts. About test_binary_vid_ffpp.py and test_multiclass_vid_ffpp.py, these scripts used image frames, not video files. Frames from the same video have similar names (the difference is in the counting part).

wasim004 commented 2 years ago

Hi,

Thanks for your kind response. I got it now.

Thanks!

wasim004 commented 2 years ago

Hi, For images classification, did you followed the directory structure like e-g train/0_original/*.jpeg? Because my directory structure is like train-test-validation/0_original-1_deepfake-etc/video-folder/concerened-frames-of-this-video e-g /databases/faceforensicspp/train/0_original/000/frame_det_00_000001.jpeg/ image

Thanks!

honghuy127 commented 2 years ago

For dataloader, we used torchvision.datasets.ImageFolder, so you need to put the images in 0_original without subfolders.

You can also see the discussion here.

wasim004 commented 2 years ago

Hi, Thanks you for response! Well, so for videos you also directly placed the images inside the 0_original or 1_deepfake etc? If yes, how the model knows which video the frames belongs to? Thanks!

honghuy127 commented 2 years ago

Yes, we treat a video as a sequence of numbered images. The scripts test_vid_binary_ffpp.py and test_vid_multiclass_ffpp.py decide whether images belong to a videos based on their name.

wasim004 commented 2 years ago

Fine, Thanks You! BTW can you please explain the Real column in below table? I mean what these scores shows? Thanks!

image

honghuy127 commented 2 years ago

Real means original images. The numbers there are accuracies. For instance, in 'real' column, they are ratios of correct 'real' predictions to all 'real' samples.

wasim004 commented 2 years ago

Fine, Thanks!

wasim004 commented 1 year ago

Hi,

May I get the code for visualizing the capsule layers using GradCam or Guided Backpropagation in your case. I need to select the layer features of the capsule network. Thanks!

honghuy127 commented 1 year ago

Hi,

I used the code from this repository for visualization. You need to modify the code of the model to make it work.