Open zwd13122889 opened 4 years ago
From the paper:
As the master gates only focus on coarse-grained control, modeling them with the same dimensions as the hidden states is computationally expensive and unnecessary. In practice, we set f_t and i_t to be D/C dimensional vectors, where D is the dimension of hidden state, and C is a chunk size factor. We repeat each dimension C times, before the element-wise multiplication with f_t and i_t. The downsizing significantly reduces the number of extra parameters that we need to add to the LSTM. Therefore, every neuron within each C-sized chunk shares the same master gates
what chunk_size meaning?