mk-minchul / AdaFace

MIT License
665 stars 122 forks source link

Problems using custom dataset to train Adaface #64

Open trnikon opened 2 years ago

trnikon commented 2 years ago

I am trying to train AdaFace with my own dataset: a custom folder ('data'), a subfolder called 'imgs' that has a collection of other folders ('folder_001', 'folder_002', etc) with various photos of the same face inside each one ('f_001.jpg', 'f_002.jpg' etc). There are in total 30000 folders with images. I am using the following script to train (a small change from 'run_ir50_ms1mv2.sh'):

python main.py \
    --data_root /mnt/disk/data \
    --train_data_path imgs \
    --prefix ir50_ms1mv2_adaface \
    --gpus 1 \
    --use_16bit \
    --arch ir_50 \
    --batch_size 256 \
    --num_workers 8 \
    --epochs 26 \
    --lr_milestones 12,20,24 \
    --lr 0.1 \
    --head adaface \
    --m 0.4 \
    --h 0.333 \
    --custom_num_class 30000 \
    --low_res_augmentation_prob 0.2 \
    --crop_augmentation_prob 0.2 \
    --photometric_augmentation_prob 0.2

The result: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/disk/data/faces_emore/agedb_30/meta/sizes'

For some reason it keeps searching for validation dataset folders such as 'agedb_30', 'faces_emore' that don't exist in my project. Why are these datasets required? Do I need to set val_data_path the same as train_data_path? Am I missing some other parameter that would make this work?

I also tried to overcome this by following the README_TRAIN.md instructions closely and downloading the dataset 'faces_webface_112x112', preprocessing it with convert.py and then replacing the folder 'imgs' with my own 'imgs' folder. The result: FileNotFoundError: Found no valid file for the classes agedb_30, faces_emore. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

Finally, it is not clear to me if I have to first convert my RGB image training dataset to BGR before training AdaFace and if I can use different resolution images to train Adaface (e.g. 224x224) than the standard 112x112

mk-minchul commented 2 years ago

Hi trnikon. The agedb30 is one of the 5 validation sets that are used for tracking the model performance during training. If you do not need these, you should change the validation dataset and data_loader with your own dataset. For creating the agedb30, it will be created in this line of code. https://github.com/mk-minchul/AdaFace/blob/76f4ce203a9f768cf6c118c02124ddbae3c3dce9/convert.py#L89

ANDRESHZ commented 1 year ago

Thanks by your updating @mk-minchul

I got the same log: "No such file or directory: ... /agedb_30/meta/sizes", in my case I'm trying to use another folder with images (like in train folder: ) to validate the results. For example:

  -"train" folder:_
    -"1"..."62" folders:
      -images of each folder class (62)
 -"val" folder:
    -"1"..."62" folders not in train:
      -face images of each folder class (62).

In my case how can I use a different validation dataset and data_loader, taking base from my "val" folder? and do not use neither agedb_30 or .rec file.

So I hope that @trnikon or @mk-minchul maybe could tell us how you solved.

Regards

martinenkoEduard commented 1 year ago

Have you managed to solve this?

martinenkoEduard commented 1 year ago

I am trying to train AdaFace with my own dataset: a custom folder ('data'), a subfolder called 'imgs' that has a collection of other folders ('folder_001', 'folder_002', etc) with various photos of the same face inside each one ('f_001.jpg', 'f_002.jpg' etc). There are in total 30000 folders with images. I am using the following script to train (a small change from 'run_ir50_ms1mv2.sh'):

python main.py \
    --data_root /mnt/disk/data \
    --train_data_path imgs \
    --prefix ir50_ms1mv2_adaface \
    --gpus 1 \
    --use_16bit \
    --arch ir_50 \
    --batch_size 256 \
    --num_workers 8 \
    --epochs 26 \
    --lr_milestones 12,20,24 \
    --lr 0.1 \
    --head adaface \
    --m 0.4 \
    --h 0.333 \
    --custom_num_class 30000 \
    --low_res_augmentation_prob 0.2 \
    --crop_augmentation_prob 0.2 \
    --photometric_augmentation_prob 0.2

The result: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/disk/data/faces_emore/agedb_30/meta/sizes'

For some reason it keeps searching for validation dataset folders such as 'agedb_30', 'faces_emore' that don't exist in my project. Why are these datasets required? Do I need to set val_data_path the same as train_data_path? Am I missing some other parameter that would make this work?

I also tried to overcome this by following the README_TRAIN.md instructions closely and downloading the dataset 'faces_webface_112x112', preprocessing it with convert.py and then replacing the folder 'imgs' with my own 'imgs' folder. The result: FileNotFoundError: Found no valid file for the classes agedb_30, faces_emore. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

Finally, it is not clear to me if I have to first convert my RGB image training dataset to BGR before training AdaFace and if I can use different resolution images to train Adaface (e.g. 224x224) than the standard 112x112

Have you solved this?

ANDRESHZ commented 1 year ago

yes, use the rec file to create the folder to test data using the code.

python convert.py --rec_path <DATASET_ROOT>/<DATASET_NAME> --make_image_files --make_validation_memfiles

and put the path in the comand to train

vkouam commented 11 months ago

yes, use the rec file to create the folder to test data using the code.

python convert.py --rec_path <DATASET_ROOT>/<DATASET_NAME> --make_image_files --make_validation_memfiles

and put the path in the comand to train

Hello,

I would like to use an other dataset for the validation (ylfw). Do you please have an idea, how I should structure the dataset folder to get adaface run on it?