yusuketomoto / chainer-char-rnn

karpathy's char-rnn (https://github.com/karpathy/char-rnn) implementation by Chainer
MIT License
167 stars 62 forks source link

Question: minibatch data is not contiguous? #19

Open akitakeuchi opened 8 years ago

akitakeuchi commented 8 years ago

Hi,

Thank you for the great contribution. The program works fine with tinyshakespeare dataset and other dataset, however part of "train.py" code looks quite strange to me. Line 87-91:

for i in xrange(jump * n_epochs): x_batch = np.array([train_data[(jump * j + i) % whole_len] for j in xrange(batchsize)]) y_batch = np.array([train_data[(jump * j + i + 1) % whole_len] for j in xrange(batchsize)])

While "train_data" is the source character sequence, x_data seems to consist of characters from separate positions, that is, from every "jump" distant positions. To train RNN, internal state must be carried over to next input, but this minibatch data seems to violate this input data continuity. I would appreciate if you explain why the code works fine. Thanks.

benob commented 8 years ago

As far as I understand, in general, a minibatch should process independent examples (for the gradient to be a good estimation of the global gradient). In RNNs, examples are not independent, but if we take the minibatch from far away characters, we get a good approximation. So the minibatch acts like a rake in which teeth are separated by the jump value, and which is moved from character to next.

akitakeuchi commented 8 years ago

Thank you for the comment. I got the point.