Open RiemanZeta opened 6 years ago
Not in the near future. The gradients will vanish for long sequences anyway. A fairly standard practice is to run an LSTM on chunks of eg 50 tokens at a time, and then carry forward the cell and hidden state to the next chunk.
I was trying to train an lstm where the max sequence length is greater than 64. It printed a message that said splits above a length of 64 are not supported yet. Any idea when this will be fixed?