Open jiongye opened 8 years ago
The code has an assumption that the sequences at the same position of each batch are actually subsequent . So all the first sequence of batch 1, 2, 3, ... in your case is [0,1,2,3,4], [5,6,7,8,9], [10,11,12,13,14] which are sequential and they can share the same hidden state in training over different batches.
The following lines transform the xdata to tensors with the correct dimensions, but the output data are not in the correct order anymore. self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1) self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)
I think the correct transformation should be the following: self.x_batches = xdata.reshape(-1, self.batch_size, self.seq_length) self.y_batches = ydata.reshape(-1, self.batch_size, self.seq_length)
Here is an example:
xdata = np.array(range(100)) xdata => array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
batch_size = 5 seq_length = 5 num_batches = 4
m = np.split(xdata.reshape(batch_size, -1), num_batches, 1)
m => [array([[ 0, 1, 2, 3, 4], [20, 21, 22, 23, 24], [40, 41, 42, 43, 44], [60, 61, 62, 63, 64], [80, 81, 82, 83, 84]]), array([[ 5, 6, 7, 8, 9], [25, 26, 27, 28, 29], [45, 46, 47, 48, 49], [65, 66, 67, 68, 69], [85, 86, 87, 88, 89]]), array([[10, 11, 12, 13, 14], [30, 31, 32, 33, 34], [50, 51, 52, 53, 54], [70, 71, 72, 73, 74], [90, 91, 92, 93, 94]]), array([[15, 16, 17, 18, 19], [35, 36, 37, 38, 39], [55, 56, 57, 58, 59], [75, 76, 77, 78, 79], [95, 96, 97, 98, 99]])]
and
n = xdata.reshape(-1, batch_size, seq_length) n => array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]],