sherjilozair / char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
MIT License
2.64k stars 960 forks source link

create_batches in TextLoader in utils.py doesn't seem to transform the data into batches correctly #17

Open jiongye opened 8 years ago

jiongye commented 8 years ago

The following lines transform the xdata to tensors with the correct dimensions, but the output data are not in the correct order anymore. self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1) self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)

I think the correct transformation should be the following: self.x_batches = xdata.reshape(-1, self.batch_size, self.seq_length) self.y_batches = ydata.reshape(-1, self.batch_size, self.seq_length)

Here is an example:

xdata = np.array(range(100)) xdata => array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

batch_size = 5 seq_length = 5 num_batches = 4

m = np.split(xdata.reshape(batch_size, -1), num_batches, 1)

m => [array([[ 0, 1, 2, 3, 4], [20, 21, 22, 23, 24], [40, 41, 42, 43, 44], [60, 61, 62, 63, 64], [80, 81, 82, 83, 84]]), array([[ 5, 6, 7, 8, 9], [25, 26, 27, 28, 29], [45, 46, 47, 48, 49], [65, 66, 67, 68, 69], [85, 86, 87, 88, 89]]), array([[10, 11, 12, 13, 14], [30, 31, 32, 33, 34], [50, 51, 52, 53, 54], [70, 71, 72, 73, 74], [90, 91, 92, 93, 94]]), array([[15, 16, 17, 18, 19], [35, 36, 37, 38, 39], [55, 56, 57, 58, 59], [75, 76, 77, 78, 79], [95, 96, 97, 98, 99]])]

and

n = xdata.reshape(-1, batch_size, seq_length) n => array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]],

   [[25, 26, 27, 28, 29],
    [30, 31, 32, 33, 34],
    [35, 36, 37, 38, 39],
    [40, 41, 42, 43, 44],
    [45, 46, 47, 48, 49]],

   [[50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59],
    [60, 61, 62, 63, 64],
    [65, 66, 67, 68, 69],
    [70, 71, 72, 73, 74]],

   [[75, 76, 77, 78, 79],
    [80, 81, 82, 83, 84],
    [85, 86, 87, 88, 89],
    [90, 91, 92, 93, 94],
    [95, 96, 97, 98, 99]]])
nijianmo commented 7 years ago

The code has an assumption that the sequences at the same position of each batch are actually subsequent . So all the first sequence of batch 1, 2, 3, ... in your case is [0,1,2,3,4], [5,6,7,8,9], [10,11,12,13,14] which are sequential and they can share the same hidden state in training over different batches.