Accuracy Goes Down And Loss Increases After Fifth Epoch

Thanks for a great paper and project! Just a quick question - is it expected to see the validation loss increasing during training? Here's an excerpt of my train_RNN.py run:
(cs510-env) joe@msivr:~/ChemTS/train_RNN$ python train_RNN.py 
Using TensorFlow backend.
['\n', '&', 'C', '(', ')', 'c', '1', '2', 'o', '=', 'O', 'N', '3', 'F', '[C@@H]', 'n', '-', '#', 'S', 'Cl', '[O-]', '[C@H]', '[NH+]', '[C@]', 's', 'Br', '/', '[nH]', '[NH3+]', '4', '[NH2+]', '[C@@]', '[N+]', '[nH+]', '\\', '[S@]', '5', '[N-]', '[n+]', '[S@@]', '[S-]', '6', '7', 'I', '[n-]', 'P', '[OH+]', '[NH-]', '[P@@H]', '[P@@]', '[PH2]', '[P@]', '[P+]', '[S+]', '[o+]', '[CH2-]', '[CH-]', '[SH+]', '[O+]', '[s+]', '[PH+]', '[PH]', '8', '[S@@+]']
249456
(249456, 81, 64)
train_RNN.py:276: UserWarning: Update your `GRU` call to the Keras 2 API: `GRU(input_shape=(81, 64), activation="tanh", return_sequences=True, units=512)`
  model.add(GRU(output_dim=512, input_shape=(81,64),activation='tanh',return_sequences=True))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 81, 64)            4096      
_________________________________________________________________
gru_1 (GRU)                  (None, 81, 512)           886272    
_________________________________________________________________
dropout_1 (Dropout)          (None, 81, 512)           0         
_________________________________________________________________
gru_2 (GRU)                  (None, 81, 512)           1574400   
_________________________________________________________________
dropout_2 (Dropout)          (None, 81, 512)           0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 81, 64)            32832     
=================================================================
Total params: 2,497,600
Trainable params: 2,497,600
Non-trainable params: 0
_________________________________________________________________
None
/home/joe/anaconda3/envs/cs510-env/lib/python3.6/site-packages/keras/models.py:844: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  warnings.warn('The `nb_epoch` argument in `fit` '
Train on 224510 samples, validate on 24946 samples
Epoch 1/100
2017-10-24 20:24:31.121363: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 20:24:31.149499: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 20:24:31.149559: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 20:24:31.149574: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 20:24:31.149586: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 20:24:32.398217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-10-24 20:24:32.473252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.645
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.58GiB
2017-10-24 20:24:32.490260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-10-24 20:24:32.490558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-10-24 20:24:32.533155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
   512/224510 [..............................] - ETA: 130838s - loss: 4.1613 - a  1024/224510 [..............................] - ETA: 66152s - loss: 3.7195 - ac  1536/224510 [..............................] - ETA: 44533s - loss: 4.7937 - ac  2048/224510 [..............................] - ETA: 33699s - loss: 5.2105 - ac  2560/224510 [..............................] - ETA: 27209s - loss: 6.8686 - ac  3072/224510 [..............................] - ETA: 22863s - loss: 7.5061 - ac  3584/224510 [..............................] - ETA: 19750s - loss: 7.5390 - ac  4096/224510 [..............................] - ETA: 17420s - loss: 7.5092 - ac  4608/224510 [..............................] - ETA: 15607s - loss: 7.3906 - ac  5120/224510 [..............................] - ETA: 14145s - loss: 7.0171 - ac  5632/224510 [..............................] - ETA: 12949s - loss: 6.7547 - ac  6144/224510 [..............................] - ETA: 11950s - loss: 6.4149 - ac  6656/224510 [..............................] - ETA: 11096s - loss: 6.1455 - ac  7168/224510 [..............................] - ETA: 10398s - loss: 5.9173 - ac  7680/224510 [>.............................] - ETA: 9762s - loss: 5.6991 - acc  8192/224510 [>.............................] - ETA: 9200s - loss: 5.5889 - acc  8704/224510 [>.............................] - ETA: 8742s - loss: 5.4481 - acc  9216/224510 [>.............................] - ETA: 8296s - loss: 5.2895 - acc  9728/224510 [>.............................] - ETA: 7895s - loss: 5.1330 - acc 10240/224224510/224510 [==============================] - 601s - loss: 1.5048 - acc: 0.5973 - val_loss: 1.0923 - val_acc: 0.6607.8547 - acc: 0.3229
Epoch 2/100
224510/224510 [==============================] - 171s - loss: 1.2080 - acc: 0.6355 - val_loss: 1.1231 - val_acc: 0.6476
Epoch 3/100
224510/224510 [==============================] - 174s - loss: 1.1588 - acc: 0.6409 - val_loss: 1.0980 - val_acc: 0.6479
Epoch 4/100
224510/224510 [==============================] - 175s - loss: 1.1511 - acc: 0.6408 - val_loss: 1.0865 - val_acc: 0.6530
Epoch 5/100
224510/224510 [==============================] - 171s - loss: 1.1522 - acc: 0.6461 - val_loss: 1.0804 - val_acc: 0.6530
Epoch 6/100
224510/224510 [==============================] - 177s - loss: 1.8968 - acc: 0.5763 - val_loss: 2.2989 - val_acc: 0.5471
Epoch 7/100
224510/224510 [==============================] - 172s - loss: 2.3063 - acc: 0.5471 - val_loss: 2.3015 - val_acc: 0.5470
Epoch 8/100
224510/224510 [==============================] - 170s - loss: 2.3078 - acc: 0.5470 - val_loss: 2.3045 - val_acc: 0.5470
Epoch 9/100
224510/224510 [==============================] - 170s - loss: 2.3095 - acc: 0.5469 - val_loss: 2.3082 - val_acc: 0.5470
Epoch 10/100
224510/224510 [==============================] - 170s - loss: 2.3084 - acc: 0.5469 - val_loss: 2.3025 - val_acc: 0.5472
Epoch 11/100
224510/224510 [==============================] - 170s - loss: 2.3110 - acc: 0.5470 - val_loss: 2.3051 - val_acc: 0.5470
Epoch 12/100
224510/224510 [==============================] - 170s - loss: 2.3110 - acc: 0.5470 - val_loss: 2.3112 - val_acc: 0.5460
Epoch 13/100
I would have expected validation loss to go down during training, but perhaps I'm not letting training run long enough? Any guidance here is appreciated.
tsudalab / ChemTS

Accuracy Goes Down And Loss Increases After Fifth Epoch #3