jellAIfish / jellyfish

This repository is inspired by Quinn Liu's repository Walnut.
4 stars 4 forks source link

A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification #45

Closed markroxor closed 6 years ago

markroxor commented 6 years ago

https://arxiv.org/pdf/1510.03820.pdf

markroxor commented 6 years ago

Convolutional Neural Networks (CNNs) have recently achieved remarkably strong performance on the practically important task of sentence classification (Kim, 2014; Kalchbrenner et al., 2014; Johnson and Zhang, 2014).

However, these models require practitioners to specify an ex act model architecture and set accompanying hyper parameters, including the filter region size, regularization parameters, and so on.

markroxor commented 6 years ago

Kim (2014), for example, proposed a simple one-layer CNN that achieved state-of-the-art (or comparable) results across several datasets. The very strong results achieved with this comparatively simple CNN architecture suggest that it may serve as a drop-in replacement for well-established baseline models, such as SVM (Joachims, 1998) or logistic regression.

Furthermore, in practice exploring the space of possible configurations for this model is extremely expensive, for two reasons: (1) training these models is relatively slow, even using GPUs. For example, on the SST-1 dataset (Socher et al., 2013), it takes about 1 hour to run 10-fold cross validation, using a similar configuration to that described in (Kim, 2014). (2) The space of possible model architectures and hyperparameter settings is vast. Indeed, the simple CNN architecture we consider requires, at a minimum, specifying: input word vector representations; filter region size(s); the number of feature maps; the activation function(s); the pooling strategy; and regularization terms (dropout/l2).

markroxor commented 6 years ago

2.1 CNN Architecture

We begin with a tokenized sentence which we then convert to a sentence matrix , the rows of which are word vector representations of each to- ken. These might be, e.g., outputs from trained word2vec (Mikolov et al., 2013) or GloVe (Pen- nington et al., 2014) models.

we can then effectively treat the sentence matrix as an ‘image’, and perform convolution on it via linear filters.

screenshot from 2018-01-18 17-20-38

markroxor commented 6 years ago

Not the right place!