Open amirj opened 8 years ago
Thank you for pointing this out,
prepare_data
is used in two different places, with different behavior (although i agree that there is a redundancy in the filtering).
truncate_gradient
parameter in scan
function here), we store all the activations in the forward pass to be used in the backward pass, which has an impact on computation time and the amount of memory that's being used. For longer sequences (like 1000 in your case) you might need to play with truncate_gradient
parameter of scan. I have the same problem but for me, the line shows a maxlen parameter lesser than what I want (I want 100 words but I'm only getting 15 as maxlen). I don't want my training to only be carried out for sentences of length 15.
def train(dim_word=100, # word vector dimensionality
dim=1000, # the number of LSTM units
encoder='gru',
decoder='gru_cond',
patience=10, # early stopping patience
max_epochs=5000,
finish_after=100000000000000000000000, # finish after this many updates
dispFreq=100,
decay_c=0., # L2 regularization penalty
alpha_c=0., # alignment regularization
clip_c=-1., # gradient clipping threshold
lrate=0.01, # learning rate
n_words_src=65000, # source vocabulary size
n_words=50000, # target vocabulary size
maxlen=100, # maximum length of the description
optimizer='rmsprop',
hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/dl4mt-tutorial-master/session3$ ./train.sh
Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN 4007)
{'use-dropout': [True], 'dim': [1000], 'optimizer': ['rmsprop'], 'dim_word': [150], 'reload': [False], 'clip-c': [1.0], 'n-words': [50000], 'model': ['/home/hans/git/dl4mt-tutorial/session3/model.npz'], 'learning-rate': [0.0001], 'decay-c': [0.99]}
Loading data
Building model
Building sampler
Building f_init... Done
Building f_next.. Done
Building f_log_probs... Done
Building f_cost... Done
Computing gradient... Done
Building optimizers... Done
Optimization
...................................
...................................
...................................
Epoch 0 Update 65 Cost 17509.4765625 UD 0.767469167709
Epoch 0 Update 66 Cost 17504.859375 UD 0.822523832321
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Epoch 0 Update 67 Cost 17467.9296875 UD 0.752150058746
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Epoch 0 Update 68 Cost 17452.5976562 UD 0.831667900085
Epoch 0 Update 69 Cost 17394.2402344 UD 0.73230099678
Epoch 0 Update 70 Cost 17384.1113281 UD 0.830217123032
Minibatch with zero sample under length 15
Epoch 0 Update 71 Cost 17374.1601562 UD 0.820451974869
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Minibatch with zero sample under length 15
Epoch 0 Update 72 Cost 17322.9296875 UD 0.877825975418
Epoch 0 Update 73 Cost 17319.2441406 UD 0.862649917603
Epoch 0 Update 74 Cost 17258.2480469 UD 0.820302963257
Minibatch with zero sample under length 15
Epoch 0 Update 75 Cost 17266.3398438 UD 0.854918003082
Minibatch with zero sample under length 15
So please help as to what I should do to ensure that the model gets trained over sentences of upto 100 words in length? Also, can you point out to me where the actual value 15 comes from?
Hi @hanskrupakar, by default the maxlen
parameter is set to 50, as you can check here and please compare it with your fork. This value is passed to the data iterator and TextIterator
filters the sequences respectively.
In your case, please check the average sequence length of your dataset, if your sequences are short in average you may need to further adjust maxlen
or you can even introduce another hyper-parameter like minlen
.
'maxlen' is one of the parameters in 'train_nmt.py', set to 50 by default. I get the following message during the training process: "Minibatch with zero sample under length 100" Investigating the source code shows that this message is appear when there is a batch size that the length of the source and target is greater than 'maxlen'. On the other hand, in 'data_iterator.py' training samples have been skipped when the length of source and target is greater than 'maxlen'.