lvapeab / nmt-keras

Neural Machine Translation with Keras
http://nmt-keras.readthedocs.io
MIT License
533 stars 130 forks source link

Error During Training #107

Closed mightymiff closed 4 years ago

mightymiff commented 4 years ago

Hello,

I am training a model that auto-names as Model_AttentionRNNEncoderDecoder_src_emb_200_bidir_True_enc_GRU_32_dec_ConditionalGRU_32_deepout_linear_trg_emb_200_Adam_0.001.

Model appears to run and train fine until the beginning of the 3rd epoch, where I am getting this out or range error. Any ideas what might be causing this?

Traceback (most recent call last):                                                    
  File "main.py", line 51, in <module>                                                
    train_model(parameters, args.dataset)                                             
  File "/home/nmt-keras/nmt_keras/training.py", line 166, in train_model              
    nmt_model.trainNet(dataset, training_params)                                      
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/cnn_model.py", line 923, in trainNet
    self.__train(ds, params)                                                          
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/cnn_model.py", line 1152, in __train
    initial_epoch=params['epoch_offset'])                                             
  File "/home/nmt-keras/src/keras/keras/legacy/interfaces.py", line 91, in wrapper    
    return func(*args, **kwargs)                                                      
  File "/home/nmt-keras/src/keras/keras/engine/training.py", line 1709, in fit_generator
    initial_epoch=initial_epoch)                                                      
  File "/home/nmt-keras/src/keras/keras/engine/training_generator.py", line 221, in fit_generator
    callbacks._call_batch_hook('train', 'end', batch_index, batch_logs)               
  File "/home/nmt-keras/src/keras/keras/callbacks.py", line 85, in _call_batch_hook   
    batch_hook(batch, logs)                                                           
  File "/home/nmt-keras/src/keras/keras/callbacks.py", line 366, in on_train_batch_end
    self.on_batch_end(batch, logs=logs)                                               
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/extra/callbacks.py", line 762, in on_batch_end
    params_prediction)                                                                
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/cnn_model.py", line 1695, in predictBeamSearchNet
    data = next(data_gen)                                                             
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/dataset.py", line 445, in generator
    da_enhance_list=self.params['da_enhance_list'])                                   
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/dataset.py", line 4417, in getXY_FromIndices
    label_smoothing=self.label_smoothing[id_out][set_name])                           
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/dataset.py", line 2450, in loadTextFeaturesOneHot
    y = self.loadTextFeatures(X, max_len, pad_on_batch, offset)                       
  File "/home/miniconda3/envs/textsum/lib/python3.6/site-packages/keras_wrapper/dataset.py", line 2421, in loadTextFeatures
    X_mask = np.hstack((np.ones((X_mask.shape[0], 1)), X_mask[:, :-1]))               
IndexError: tuple index out of range                                                  
lvapeab commented 4 years ago

Hello,

can you please attach your configuration file?

mightymiff commented 4 years ago

Sure, sorry for the delay. I am sure how to upload a file here without a window manager. Here is a link.

I have been kind of blindly fiddling with configuration options and most recently training made it to 14 epochs before exiting with the same error.

hiwaveSupport commented 4 years ago

I got the same error after 32 epochs. Below is the config.py output. Key change was the source and model text embedding length to 56. Also changed the batch size to 50.

config.txt

Also, looked at the file and seems like training and test files have some lines which are empty under examples/data_files. Not sure if this could be the reason?

Update: Fixed The issue was due to mis-matched rows in my training data. On fixing the empty rows and matching with the right ones on the target side of training data fixed the issue and model runs.

lvapeab commented 4 years ago

Glad to hear that. I'm closing this. Feel free to re-open it if the error persists.