Open michiyosony opened 7 years ago
Because the program employed multiprocessing when loading the data, which is then first aggregated in Queue, before actually being consumed by the model. Therefore, counting the amount of data consumed by the model at each epoch from seeing the amount of hit in generator's method is not reliable. The program delegates model feeding to Keras, so there should be no problem. Btw, I might have a wrong understanding of the output that you've shown, so do clarify it when you think I'm wrong.
Yeah, it is a deviation from the paper. The main reason on why I leave that to chance is two-fold. The first one is to ensure non-determinism in per epoch training. The second one is to avoid model overfitting because of the mirrored duplicates, which looks almost identical due to the GRID data which contains frontal-view of the person. I might be wrong about this, so further investigation is needed.
My (very potentially incorrect) understanding of an "epoch" is a set of iterations over which the model is exposed to each item in the training set one time.
In trying to understand the system better, I created a very small training set composed of
and the corresponding .align files.
I modified
unseen_speakers/train.py
to train using the lineso training would run for 1 epoch on a batch size of 2.
My output looks like this:
Why does it appear that the model is exposed to 22 videos during the first epoch? From the paper, I would have expected 16 (the 8 training videos + 8 horizontally flipped training videos).
The 16 original videos loaded can be seen (organized) here (asterisks added):
In
Curriculum.py
I can see that each video has a 50% chance of being flipped horizontally. This looks like a slightly different implementation of "...we train on both the regular and the horizontally mirrored image sequence." (LipNet). Is there a motivation for leaving it to chance whether both a video and its mirror will be included (as opposed to the same video twice, as seen in the asterisked examples above)?