polimi-ispl / icpr2020dfdc

Video Face Manipulation Detection Through Ensemble of CNNs
GNU General Public License v3.0
259 stars 100 forks source link

About test file #2

Closed zhangconghhh closed 4 years ago

zhangconghhh commented 4 years ago

I have some questions about the test code? Can you help me?

  1. In the test_model.py, the face_policy, patch_size, net_name and model_name are got from the model path, but in the model you offered there is no information about the patch_size.
  2. In the test_all.sh, the path of DFDC and FFPP are both requried for the input. Can I just run the test code on just one dataset, such as the DFDC dataset?
  3. When generating the DFDC dattaset, in line 55 of thrr index_dfdc.py. df_tmp['folder'] = int(str(jsonpath.parts[-2]).split('')[-1]) But, the DFDC dataset I downloaded from the webset has two folders(test_videos, train_sample_videos), I couldn't get the df_tmp['folder'] .
CrohnEngineer commented 4 years ago

Hey @zhangconghhh ,

  1. You're right, we forgot to insert the training hyperparameters in the models' folders. Anyway, you can find them in the paper: if I'm not mistaken, patch_size should be equal to 224, and face_policy should be equal to scale. Just to clarify things up, when running test_model.py these parameters are:

    • either extracted from the complete path to your model weights, specified as the model_path argument; anyway, this happens only if you have trained your model with train_binclass.py, since this last script, when saving the models' weights during training, creates a folder whose name includes all the hyperparameters necessary for testing too. In fact, patch_size, face_policy, net_name etc... are parsed by test_model.py from the folders' name directly, without looking at them from the argument parser; anyway,
    • if these hyperparameters are not present in the folders' name, the script reads these arguments from the parser, so you need to specify each of them.
  2. You can run the test on a single dataset. If I'm not wrong, the path for a dataset is required only if you specify that dataset as a testdb, e.g. you need to specify the path for the DFDC extracted faces and Pandas DataFrame only if you want to test your model on the DFDC dataset. The script raises a RunTimeError only if you do not specify the path for the dataset you are testing your model on, for instance if you indicate as testdb FF++ but you don't specify neither ffpp_faces_df_path nor ffpp_faces_dir.

  3. When we wrote the code we didn't have the test_videos folder from the DFDC dataset. All the videos we have used I think should be the ones contained in the train_sample_videos folder. Try running the index_dfdc.py script specifying the path to train_sample_videos on your machine as the DFDC path.

Let us know if you have any other issues! Cheers

Edoardo

zhangconghhh commented 4 years ago

In your ICPR paper, you mention the test result on the dfdc dataset is 0.8782. The test dataset in your paper is the test_sample_videos part or the last 10 veidos in the train_sample_videos? Because I saw there is a separation in the split.py. image

zhangconghhh commented 4 years ago

And when I run the make_dataset.sh for the DFDC dataset, I also meet a problem. image I use the path to train_sample_videos as the DFDC path. And I got the json_path as 'PosixPath('/media/disk/Backup/zhangcong/deepfake/dfdc/train_sample_videos/metadata.json')'. I can't get a int type for the df_tmp['folder']. Can you help me with that?

CrohnEngineer commented 4 years ago

Hey @zhangconghhh ,

In your ICPR paper, you mention the test result on the dfdc dataset is 0.8782. The test dataset in your paper is the test_sample_videos part or the last 10 veidos in the train_sample_videos? Because I saw there is a separation in the split.py.

image

as you can find in the paper, we used only the videos from the training set for training, validation and testing. In particular, we used the videos from the first 35 folders as training set, videos from folder 35 to 40 as validation set, and finally videos from the last 10 folders as test set. At the time the paper has been written, the challenge wasn't closed already, so we didn't have at hand the videos from the test_sample_videos folder you cited previously in your comments.

And when I run the make_dataset.sh for the DFDC dataset, I also meet a problem. image I use the path to train_sample_videos as the DFDC path. And I got the json_path as 'PosixPath('/media/disk/Backup/zhangcong/deepfake/dfdc/train_sample_videos/metadata.json')'. I can't get a int type for the df_tmp['folder']. Can you help me with that?

I'm sorry, I think I didn't understand your question. In make_dataset.sh from the last release we execute first index_dfdc.py and then extract_faces.py; in index_dfdc.py however, you just need to specify the path to the folder containing the 50 folders of videos of the DFDC training set, already unzipped. The metadata json file the script elaborates is then taken from each one of the 50 folders and is used to create an overall Pandas Dataframe with info about the whole dataset. You don't need to specify any json path anywhere in the code. I'll recap this for you quickly. Before launching make_dataset.sh:

  1. You should have downloaded the DFDC dataset from Kaggle;
  2. You should have unzipped all the 50 folders contained in the train_sample_videos folder;
  3. You should run index_dfdc.py indicating as argument the path to the folder containing the 50 folders of the DFDC training set.

Hope this helps. Have a good weekend! Cheers

Edoardo