top accuracy is 65.18% ?

niyunsheng commented 3 years ago

https://ai.facebook.com/blog/deepfake-detection-challenge-results-an-open-initiative-to-advance-ai/

the webpage said that the top accuracy in private evaluate dataset is about 65.18%, but I evalute the model you provided, got an accuracy about 89%, what's wrong?

looking forward for any reply.

selimsef commented 3 years ago

Facebook did not share private test dataset. Private dataset is from different distribution that's why the quality is much lower. On public test dataset of course it should be much better.

GuilinPang commented 3 years ago

Hello, how can I calculate to get accuracy after getting submission.csv? Is there an official formula or code? Thank you！

GuilinPang commented 3 years ago

I didn't know why the label of sample_submission.csv(download in https://www.kaggle.com/c/deepfake-detection-challenge/data) is all zero. Is there something wrong?

enquestor commented 3 years ago

@GuilinPang Because that's a sample submission I guess, just for you to check the format.

Here is the code I used to calculate the official loss of the train dataset. I don't guarantee it's right but at least there is something to base on.

import json
import argparse
import pandas as pd
from glob import glob
from math import log

if __name__ == '__main__':
    parser = argparse.ArgumentParser('python score.py')
    arg = parser.add_argument
    arg('--metadata-json',   type=str, default='./metadata.json', required=True, help='Path of metadata file')
    arg('--submission-csv', type=str, default='./submission.csv', required=True, help='Path of submission file')
    args = parser.parse_args()
    metadata_path = args.metadata_json
    submission_path = args.submission_csv

    videos = {}

    for json_path in glob(metadata_path):
        with open(json_path, 'r') as f:
            metadata = json.load(f)
        for k, v in metadata.items():
            videos[k] = 1.0 if v['label'] == 'FAKE' else 0.0

    for csv_path in glob(submission_path):
        submission = pd.read_csv(csv_path)

        n = len(submission)
        logloss = 0.0
        for _, row in submission.iterrows():
            filename, label = row['filename'], row['label']
            y, yh = videos[filename], label
            logloss += y * log(yh) + (1 - y) * log(1 - yh)
        logloss /= (-n)

        print('loss: {loss}'.format(loss=logloss))

selimsef / dfdc_deepfake_challenge

top accuracy is 65.18% ? #24