Text Understanding from Scratch

speakerjohnash commented 9 years ago

Crepe Code: https://github.com/zhangxiangxiao/Crepe Paper: http://arxiv.org/pdf/1502.01710v2.pdf Torch NN documentation: https://github.com/torch/nn/tree/master/doc

Three year samples: s3://s3yodlee/development/card/3_year_card_sample.txt s3://s3yodlee/development/bank/3_year_bank_sample.txt

AMI: ami-f5715ac5

redpanda-ai commented 9 years ago

:+1:

speakerjohnash commented 9 years ago

Now if we could just get anyone to care about it

redpanda-ai commented 9 years ago

Excellent. Hey, we just doubled the number of people who care about it!

redpanda-ai commented 9 years ago

Glossary of terms

Other Resources

[ ] Interesting blog

Other stuff

co-adaption When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

dropout Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of dierent \thinned" networks. At test time, it is easy to approximate the eect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This signicantly reduces overtting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classication and computational biology, obtaining state-of-the-art results on many benchmark data sets.

sentiment_polarity scrubbing text to determine if the author feels good or bad about what they are writing

convolution Convolution Convolution is a mathematical term, defined as applying a function repeatedly across the output of another function. In this context it means to apply a 'filter' over an image at all possible offsets. A filter consists of a layer of connection weights, with the input being the size of a small 2D image patch, and the output being a single unit. Since this filter is applying repeatedly, the resulting connectivity looks like a series of overlapping receptive fields, as shown in the 'sparse connectivity' image, which map to a matrix of the filter outputs (or several such matrices in the common case of using a bank of several filters). An important subtlety here is that why there are still a good deal of connections between the input layer and the filter output layer, the weights are tied together (as shown in the colored diagram). This means that during backpropagation, you only have to adjust a number of parameters equal to a single instance of the filter -- a drastic reduction from the typical FFNN architecture. Another nuance is that we could sensibly apply such filter to any input that's spatially organized, not just a picture. This means that we could add another bank of filters directly on top of our first filter bank's output. However, since the dimensionality of applying a filter is equal to the input dimensionality, we wouldn't be gaining any translation invariance with these additional filters, we'd be stuck doing pixel-wise analysis on increasingly abstract features. In order to solve this problem, we must introduce a new sort of layer: a subsampling layer.

The use of tilde "~"

In statistics, the tilde is frequently used to mean "has the distribution (of)," for instance, X∼N(0,1) means "the stochastic (random) variable X has the distribution N(0,1) (the standard normal distribution). If X and Y are stochastic variables then X∼Y means "X has the same distribution as Y. max-pooling After each convolutional layer, there may be a pooling layer. The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block. There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block. Our pooling layers will always be max-pooling layers; that is, they take the maximum of the block they are pooling.

Another important concept of CNNs is max-pooling, which is a form of non-linear down-sampling. Max-pooling partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum value.

Max-pooling is useful in vision for two reasons:

    By eliminating non-maximal values, it reduces computation for upper layers.

    It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

    Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

Max-pooling is done in Theano by way of theano.tensor.signal.downsample.max_pool_2d. This function takes as input an N dimensional tensor (where N >= 2) and a downscaling factor and performs max-pooling over the 2 trailing dimensions of the tensor.

redpanda-ai commented 9 years ago

This information is probably better on our Wiki. Closing issue, it's still available if you filter all issues by the "learning" label.

redpanda-ai / Meerkat

Text Understanding from Scratch #122