Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - NIPS, 2012.
CV, DL, CNN
3.1 Shows that ReLU converges faster than tanh and sigmoid. About six times (when not using regularization).
3.2 Used two GPUs. The parallelization scheme that we employ essentially puts half of the kernels (or neurons) on each GPU, with one additional trick: the GPUs communicate only in certain layers. Choosing the pattern of connectivity is a problem for cross-validation. This scheme reduces our top-1 and top-5 error rates by 1.7% and 1.2%, respectively
3.3 Used Local Response Normalization. Response normalization reduces our top-1 and top-5 error rates by 1.4% and 1.2%, respectively.Blog: Normalization in Neural Network
3.4 Used overlapping pooling. This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%
4.1 Used 32322(2048) varaiations of same image from translation and reflection and PCA on RGB pixels value. This scheme reduces the top-1 error rate by over 1%
4.2 Used Dropout and then averaged by a factor of 0.5
5 We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully-connected hidden layers, with the constant 1. This initialization accelerates the early stages of learning by providing the ReLUs with positive inputs.
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - NIPS, 2012.
CV, DL, CNN
3.1 Shows that ReLU converges faster than tanh and sigmoid. About six times (when not using regularization).
3.2 Used two GPUs. The parallelization scheme that we employ essentially puts half of the kernels (or neurons) on each GPU, with one additional trick: the GPUs communicate only in certain layers. Choosing the pattern of connectivity is a problem for cross-validation. This scheme reduces our top-1 and top-5 error rates by 1.7% and 1.2%, respectively
3.3 Used Local Response Normalization. Response normalization reduces our top-1 and top-5 error rates by 1.4% and 1.2%, respectively. Blog: Normalization in Neural Network
3.4 Used overlapping pooling. This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%
4.1 Used 32322(2048) varaiations of same image from translation and reflection and PCA on RGB pixels value. This scheme reduces the top-1 error rate by over 1%
4.2 Used Dropout and then averaged by a factor of 0.5
5 We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully-connected hidden layers, with the constant 1. This initialization accelerates the early stages of learning by providing the ReLUs with positive inputs.