KeyError: 'test' in dataframe of celeb-df faces

chinmaynehate commented 3 years ago

Hey,

For Celeb-DF V2 dataset: (EfficientNetB4ST) _I first ran the index_celebdf.py and extractfaces.py and got the faces and dataframe of faces.

After that,

I made the following changes in train_triplet.py:

added arguments for celebdf_faces_dir and celebdf_faces_df_path
added arguments in the make_splits() function at line 226 of train_triplet.py corresponding to celebdf faces and dataframe of celebdf faces
Also made changes in the load_df() function at line 28 of isplutils/split.py to read the celebdf faces dataframe

After this, I ran train_triplet.py using: (running for 2 iterations only) python3 train_triplet.py --net EfficientNetB4 --traindb celebdf --valdb celebdf --celebdf_faces_df_path facesdf/faces_df.pkl --celebdf_faces_dir faces/ --face scale --size 224 --workers 0 --traintriplets 70 --valtriplets 20 --maxiter 2 --valint 1

Output:

Loaded pretrained weights for efficientnet-b4
Parameters
{'face': 'scale',
 'net': 'EfficientNetB4',
 'seed': 0,
 'size': 224,
 'traindb': 'celebdf'}
Tag: net-EfficientNetB4_traindb-celebdf_face-scale_size-224_seed-0
Loading data
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'test'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_triplet.py", line 473, in <module>
    main()
  File "train_triplet.py", line 241, in main
    ffpp_dir=ffpp_faces_dir, celebdf_dir=celebdf_faces_dir, dbs={'train': train_datasets, 'val': val_datasets})
  File "/home/jupyter/celebdf_detection/isplutils/split.py", line 135, in make_splits
    split_df = get_split_df(df=full_df, dataset=split_db, split=split_name)
  File "/home/jupyter/celebdf_detection/isplutils/split.py", line 94, in get_split_df
    df[(df['label'] == False) & (df['test'] == False) ]['video'].unique())
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 2975, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'test'

In short: At line 91 of isplutils/split.py I am getting a KeyError for df['test']. There is no 'test' field in the dataframe of celebdf faces. You can verify this by running:(for celeb-df dataset)

import pandas as pd
df = pd.read_pickle("celebdf/facesdf/faces_df.pkl")
print(df['test'])

_I was able to run the code(traintriplet.py) by removing (df['test'] == False) at line 91 of isplutils/split.py

_I did similar changes for trainbinclass.py and was able to train the model using it as well.

After this: While running test_model.py for celebdf, again I got the KeyError for 'test' while making a test split at line 99 of isplutils/split.py

Any way to solve this while running the test_model.py code?

Thank you

nicobonne commented 3 years ago

This change should preserve the required columns for celeb and solve this issue.

chinmaynehate commented 3 years ago

Hey,

For CelebDF-V2 dataset,

I used the latest code to create the faces dataframe for celebdf. The issue of 'test' field has been solved. But now I am getting KeyError for 'original' field.

You can verify this by running:

import pandas as pd
df = pd.read_pickle("facesdf/faces_df.pkl")
print(df['original'])
print("\n")
print(df["test"])

I don't know why but: If you remove 'original' from fields_to_preserve_from_video list line 190 of extract_faces.py, and rerun the code(extract_faces.py), we get both the 'original' and 'test' fields in the faces dataframe.

Thank you

nicobonne commented 3 years ago

That's strange. Did you remove the old faces_df.pkl and re-run the latest code? Does the error show up in the exact same way as before, but with original field instead of test?

chinmaynehate commented 3 years ago

Hey,

Did you remove the old faces_df.pkl and re-run the latest code?

Yes I deleted the faces_df.pkl file before re-running the latest code.

Does the error show up in the exact same way as before, but with original field instead of test?

Yes

When I read the faces dataframe and print(df['original']) using the latest code: I get KeyError: 'original'

But 'test' field is present which I verified by print(df['test'])

On the other hand, if I remove 'original' from the list, I get the original and test fields.

>>> df['original']

facepath
Celeb-real/id58_0003.mp4/fr000_subj0.jpg   -1
Celeb-real/id58_0003.mp4/fr016_subj0.jpg   -1
Celeb-real/id58_0003.mp4/fr032_subj0.jpg   -1
Celeb-real/id58_0003.mp4/fr048_subj0.jpg   -1
Celeb-real/id58_0003.mp4/fr064_subj0.jpg   -1
                                           ..
YouTube-real/00192.mp4/fr411_subj0.jpg     -1
YouTube-real/00192.mp4/fr427_subj0.jpg     -1
YouTube-real/00192.mp4/fr442_subj0.jpg     -1
YouTube-real/00192.mp4/fr457_subj0.jpg     -1
YouTube-real/00192.mp4/fr473_subj0.jpg     -1
Name: original, Length: 208840, dtype: int64

>>> df['test']

facepath
Celeb-real/id58_0003.mp4/fr000_subj0.jpg    False
Celeb-real/id58_0003.mp4/fr016_subj0.jpg    False
Celeb-real/id58_0003.mp4/fr032_subj0.jpg    False
Celeb-real/id58_0003.mp4/fr048_subj0.jpg    False
Celeb-real/id58_0003.mp4/fr064_subj0.jpg    False
                                            ...  
YouTube-real/00192.mp4/fr411_subj0.jpg      False
YouTube-real/00192.mp4/fr427_subj0.jpg      False
YouTube-real/00192.mp4/fr442_subj0.jpg      False
YouTube-real/00192.mp4/fr457_subj0.jpg      False
YouTube-real/00192.mp4/fr473_subj0.jpg      False
Name: test, Length: 208840, dtype: bool

nicobonne commented 3 years ago

This commit should fix the issue, please try to repeat the steps and let us know if you succedeed

chinmaynehate commented 3 years ago

Hey,

Yes. The issue is solved.

Thank you

polimi-ispl / icpr2020dfdc

KeyError: 'test' in dataframe of celeb-df faces #40