(must see) X. Glorot and Y. Bengio. Understanding the difficulty of trainingdeep feedforward neural networks. InAISTATS, 2010.
(must see) Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." ICML (3) 28 (2013): 1310-1318.
vanishing and exploding gradient / sensitivity