In the article you said that ''Our LSTM has two layers and is unrolled for 50 steps in both experiments. It has 400 cells per layer and its parameters are initialized uniformly in [−0.08, 0.08]. ". I don't understand how you back propagate and calculate cross-entropy losses in 50 steps, while the input continues and there is no output. Could you help me?
In the article you said that ''Our LSTM has two layers and is unrolled for 50 steps in both experiments. It has 400 cells per layer and its parameters are initialized uniformly in [−0.08, 0.08]. ". I don't understand how you back propagate and calculate cross-entropy losses in 50 steps, while the input continues and there is no output. Could you help me?