Videos seen by model each epoch

My (very potentially incorrect) understanding of an "epoch" is a set of iterations over which the model is exposed to each item in the training set one time.

In trying to understand the system better, I created a very small training set composed of

s1/
    s1lbax4n
    s1swwp2s
    s1pwij3p
    s1bbaf2n
s2/
    s2lbax4n
    s2swwp2s
    s2pwij3p
    s2bbaf2n

and the corresponding .align files.

I modified unseen_speakers/train.py to train using the line

train(run_name, 0, 1, 3, 100, 50, 75, 32, 2)

so training would run for 1 epoch on a batch size of 2.

My output looks like this:

epoch is: 0
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Train [0,1] 0:2
Epoch 1/1
epoch is: 0
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Train [0,1] 2:4
In Curriculum.apply: NOT flipping video s2/s2swwp2s
In Curriculum.apply: NOT flipping video s1/s1lbax4n
In Curriculum.apply: NOT flipping video s1/s1swwp2s
In Curriculum.apply: flipping video s1/s1bbaf2n
Train [0,0] 4:6
Train [0,0] 6:8
In Curriculum.apply: flipping video s1/s1pwij3p
In Curriculum.apply: NOT flipping video s2/s2bbaf2n
In Curriculum.apply: NOT flipping video s2/s2lbax4n
In Curriculum.apply: NOT flipping video s2/s2pwij3p
Train [0,0] 0:2
Train [0,0] 2:4
In Curriculum.apply: flipping video s1/s1lbax4n
In Curriculum.apply: NOT flipping video s2/s2swwp2s
In Curriculum.apply: NOT flipping video s1/s1swwp2s
In Curriculum.apply: NOT flipping video s1/s1bbaf2n
Train [0,0] 4:6
Train [0,0] 6:8
In Curriculum.apply: NOT flipping video s1/s1pwij3p
In Curriculum.apply: flipping video s2/s2bbaf2n
In Curriculum.apply: flipping video s2/s2lbax4n
In Curriculum.apply: flipping video s2/s2pwij3p
1/4 [======>.......................] - ETA: 255s - loss: 191.3861Train [0,0] 0:2
In Curriculum.apply: flipping video s2/s2swwp2s
In Curriculum.apply: NOT flipping video s1/s1swwp2s

2/4 [==============>...............] - ETA: 168s - loss: 183.9747Train [0,0] 2:4
In Curriculum.apply: flipping video s1/s1lbax4n
In Curriculum.apply: flipping video s1/s1bbaf2n

3/4 [=====================>........] - ETA: 83s - loss: 180.0006 Train [0,0] 4:6
In Curriculum.apply: flipping video s1/s1pwij3p
In Curriculum.apply: NOT flipping video s2/s2lbax4n
epoch is: 0
Epoch 0: Curriculum(train: False, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
epoch is: 0
Epoch 0: Curriculum(train: False, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
epoch is: 0
Epoch 0: Curriculum(train: False, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)

[Epoch 0] Out of 256 samples: [CER: 30.250 - 1.440] [WER: 6.000 - 1.000] [BLEU: 0.325 - 0.325]

/Users/michiyosony/tensorflow/lib/python2.7/site-packages/nltk/translate/bleu_score.py:472: UserWarning: 
Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)

4/4 [==============================] - 1326s - loss: 173.2259 - val_loss: 145.0103

Process finished with exit code 0

Why does it appear that the model is exposed to 22 videos during the first epoch? From the paper, I would have expected 16 (the 8 training videos + 8 horizontally flipped training videos).

The 16 original videos loaded can be seen (organized) here (asterisks added):

In Curriculum.apply: flipping video s1/s1bbaf2n
In Curriculum.apply: flipping video s1/s1pwij3p
In Curriculum.apply: flipping video s1/s1lbax4n
**In Curriculum.apply: NOT flipping video s1/s1swwp2s**
**In Curriculum.apply: NOT flipping video s1/s1swwp2s**
In Curriculum.apply: NOT flipping video s1/s1bbaf2n
In Curriculum.apply: NOT flipping video s1/s1pwij3p
In Curriculum.apply: NOT flipping video s1/s1lbax4n

In Curriculum.apply: flipping video s2/s2bbaf2n
In Curriculum.apply: flipping video s2/s2lbax4n
In Curriculum.apply: flipping video s2/s2pwij3p
**In Curriculum.apply: NOT flipping video s2/s2swwp2s**
**In Curriculum.apply: NOT flipping video s2/s2swwp2s**
In Curriculum.apply: NOT flipping video s2/s2bbaf2n
In Curriculum.apply: NOT flipping video s2/s2lbax4n
In Curriculum.apply: NOT flipping video s2/s2pwij3p

In Curriculum.py I can see that each video has a 50% chance of being flipped horizontally. This looks like a slightly different implementation of "...we train on both the regular and the horizontally mirrored image sequence." (LipNet). Is there a motivation for leaving it to chance whether both a video and its mirror will be included (as opposed to the same video twice, as seen in the asterisked examples above)?

rizkiarm / LipNet

Videos seen by model each epoch #14