Good survey paper to get introduction to Deep Learning. Provides the history and overview of how Deep Learning evolved. They provide an introduction to supervised learning and then provides with biological motivations behind these architectures. They also explain how the CNN and RNN architetures work and showcases some successful applications.
This landmark paper in CNN architecture showcased remarkable results for the 2012 ILSVRC Challenge and ahead of their competition by about 10.8%. Their main contributions are that they used Relu instead of Tanh and dropout to handle overfitting.
This paper was runner up of 2014 ILSVRC Challenge. The architecture is very uniform (composed of few modules, each modules being few convolution layers with a pooling layer at the end). This particular model has large number of parameters and therefore more challenging to handle. However this is very appealing for feature extraction.
Winner of 2014 ILSVRC Challenge achieving 6.67% top-5 accuracy. The main contribution is the inception module which allows to reduce the parameters drastically. This allows for creating large networks and the winning network had 22 layers.
This paper appeaard in 2015 ILSVRC challenge. Their main contribution was skip connections that allowed gradients to bypass certain layers and connect to layers much higher or lower in the networks. This allowed them to create very large networks, as big as 152 layers large.