Can't train a model using the IMDB-WIKI dataset-possible due to corrupted .mat file

galoiscch commented 7 years ago

When I tried to run "create_db.py", after the progress bar reached 100%, my computer was frozen. After a while, the command line output a word "killed". Since a file called imdb_db.mat was found in the directory "data",I tried to continue the training process, but the error "IOError: could not read bytes" appeared.

I checked the "imdb.mat" in the directory "imdb_crop" with Octave(I don't have MATLAB), I found that I can't be loaded into Octave. The error "warning: load: can not read non-ASCII portions of UTF characters; replacing unreadable characters with'?'" appeared when I tried to load the "imdb.mat". I did the same to "imdb_db.mat",the error " error: load: reading matrix data for 'image' error: load: trouble reading binary file '/home/mt/Downloads/age-gender-estimation-master/data/imdb_db.mat'" appeared. Only gender,img_size,min_score was loaded.

Is the file "imdb.mat" corrupted? Or I used the file wrongly? I am looking forward to your reply. Your help will be greatly appreciated.

galoiscch commented 7 years ago

Or can you give me the format of your label so that I can make my own dataset? Thanks a lot

yu4u commented 7 years ago

How was the file size of "imdb.mat"? It should be larger than 2GB (2114467536). Could you try smaller dataset "wiki": python3 create_db.py --output data/wiki_db.mat --db wiki --img_size 64.

yu4u commented 7 years ago

The details of the labels can be found in the project page of the original work. I simply made the function to load the meta file and you can use it independently from create_db.py as:

from utils import get_meta
full_path, dob, gender, photo_taken, face_score, second_face_score, age = get_meta(mat_path, db)

Please refer to utils.py for implementation: https://github.com/yu4u/age-gender-estimation/blob/master/utils.py

galoiscch commented 7 years ago

The size of "imdb.mat " is just 22.9MB.

galoiscch commented 7 years ago

By the way, how large is the dataset you used for training the pre-trained model that you gave to us? Thanks.

yu4u commented 7 years ago

I used 90% of the IMDB dataset for training and the remaining for validation.

galoiscch commented 7 years ago

I started the training on wiki dataset yesterday. However, the loss became nan in the fourth epoch. I tried to use weight files it generated, but the age estimation of all people are 28,M.

yu4u commented 7 years ago

How about decreasing learning rate? Currently, we should change the source code because no option is provided for the modification of learning rate...

class Schedule:
    def __init__(self, nb_epochs):
        self.epochs = nb_epochs

    def __call__(self, epoch_idx):
        if epoch_idx < self.epochs * 0.25:
            return 0.1
        elif epoch_idx < self.epochs * 0.5:
            return 0.02
        elif epoch_idx < self.epochs * 0.75:
            return 0.004
        return 0.0008

galoiscch commented 7 years ago

Thank, I will try it tomorrow. However, I think it may not due to high learning rate, because the drop of loss was not gradual. The loss was about 19 just before the loss dropped to nan.

galoiscch commented 7 years ago

3264/34324 [=>............................] - ETA: 18302s - loss: 19.6312 - dense_1_loss: 4.1135 - dense_2_loss: 15.5105 - dense_1_ac 
3296/34324 [=>............................] - ETA: 18284s - loss: nan - dense_1_loss: nan - dense_2_loss: nan - dense_1_acc: 0.7403 -

The loss suddenly dropped

yu4u commented 7 years ago

In my environment, loss did not become nan. I used default parameters except the image size --img_size 64.

Epoch 3/30
34304/34324 [============================>.] - ETA: 0s - loss: 19.6199 - dense_1_loss: 4.0342 - dense_2_loss: 15.5571 - dense_1_acc: 0.7497 - dense_2_acc: 0.0348Epoch 00002: val_loss improved from 19.65336 to 19.59147, saving model to checkpoints/weights.02-19.59.hdf5
34324/34324 [==============================] - 338s - loss: 19.6203 - dense_1_loss: 4.0342 - dense_2_loss: 15.5574 - dense_1_acc: 0.7497 - dense_2_acc: 0.0348 - val_loss: 19.5915 - val_dense_1_loss: 3.9978 - val_dense_2_loss: 15.5856 - val_dense_1_acc: 0.7520 - val_dense_2_acc: 0.0330
Epoch 4/30
34304/34324 [============================>.] - ETA: 0s - loss: 37.1522 - dense_1_loss: 2.0977 - dense_2_loss: 8.3536 - dense_1_acc: 0.6014 - dense_2_acc: 0.0245Epoch 00003: val_loss did not improve
34324/34324 [==============================] - 338s - loss: 37.1437 - dense_1_loss: 2.0968 - dense_2_loss: 8.3512 - dense_1_acc: 0.6015 - dense_2_acc: 0.0246 - val_loss: 22.4076 - val_dense_1_loss: 0.5602 - val_dense_2_loss: 4.0711 - val_dense_1_acc: 0.7520 - val_dense_2_acc: 0.0393
Epoch 5/30
34304/34324 [============================>.] - ETA: 0s - loss: 11.9625 - dense_1_loss: 0.5696 - dense_2_loss: 4.0888 - dense_1_acc: 0.7494 - dense_2_acc: 0.0376Epoch 00004: val_loss improved from 19.59147 to 6.70381, saving model to checkpoints/weights.04-6.70.hdf5
34324/34324 [==============================] - 338s - loss: 11.9594 - dense_1_loss: 0.5695 - dense_2_loss: 4.0888 - dense_1_acc: 0.7495 - dense_2_acc: 0.0376 - val_loss: 6.7038 - val_dense_1_loss: 0.5629 - val_dense_2_loss: 4.0659 - val_dense_1_acc: 0.7520 - val_dense_2_acc: 0.0443
Epoch 6/30
 9568/34324 [=======>......................] - ETA: 233s - loss: 6.2101 - dense_1_loss: 0.5671 - dense_2_loss: 4.0743 - dense_1_acc: 0.7491 - dense_2_acc: 0.0397

galoiscch commented 7 years ago

I think it is something which only happens in training with CPU.

galoiscch commented 7 years ago

What will happen if the training set is too small? I just train the program with 139 images and the estimation of all person is 30 years-old and female. P.S I am now making a dataset which consists of Asian faces only. Wish me good luck.

marlesson commented 7 years ago

@galoiscch if the dataset is too small, it will not be representative enough for "population". That is, the model is not generalizable. DeepLearning needs a LOT of data for training.

nyck33 commented 5 years ago

I had the same problem with the imdb_db.mat being only 200B in size but the create_db.py ended with a memory error.

yu4u commented 5 years ago

@nyck33 The implementation is really bad... I fixed the problem a little bit. Please checkout the latest code and try it. It works with the half memory size.

nyck33 commented 5 years ago

Hi Yu4u-san,

I tried but ended up with a Memory Error. I tried the compress=True option with the old code (still Memory Errors) so will do so with the updated one. However, I trained UTKFace with depth 10, width 4 for almost 90 percent accuracy on gender in 29 epochs so can send you the weights file if you think it's beneficial. Thanks for all your help.

From: Yusuke Uchida notifications@github.com Sent: Wednesday, September 11, 2019 12:31 PM To: yu4u/age-gender-estimation age-gender-estimation@noreply.github.com Cc: Toyoda Kim, Nobutaka T nobutaka@gatech.edu; Mention mention@noreply.github.com Subject: Re: [yu4u/age-gender-estimation] Can't train a model using the IMDB-WIKI dataset-possible due to corrupted .mat file (#6)

@nyck33https://github.com/nyck33 The implementation is really bad... I fixed the problem a little bit. Please checkout the latest code and try it. It works with the half memory size.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yu4u/age-gender-estimation/issues/6?email_source=notifications&email_token=AGAFZKNR7K3NNTEBZLZLC5TQJEMMXA5CNFSM4DS2SPXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6PC2RI#issuecomment-530459973, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGAFZKMTOKSFH7EZTTIL563QJEMMXANCNFSM4DS2SPXA.

sebastiaopamplona commented 5 years ago

The size of "imdb.mat " is just 22.9MB.

Hello @galoiscch, the size of my imdb.mat is also 22.9mb... Have you found the original imdb.mat file?

yu4u / age-gender-estimation

Can't train a model using the IMDB-WIKI dataset-possible due to corrupted .mat file #6