keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.13k stars 19.49k forks source link

Example of How to Construct 1D Convolutional Net on Text #233

Closed simonhughes22 closed 9 years ago

simonhughes22 commented 9 years ago

I'd like to use keras to build a 1D convolutional net with pooling layers on some textual input, but I can't figure out the right input format and the right number of incoming connections above the flatten layer. Would you be able to provide a simple example using one of the data sets?

Awesome work by the way, great library.

lukedeo commented 9 years ago

You actually should probably use a 2D convolution, depending on what you're trying to do. If you have word vectors of size wv_sz, and you truncate/pad each sentence to have nb_tokens tokens, you can form a "sentence image" of size (nb_tokens, wv_sz). You can then choose 1 or more n-gram sizes to use as a filter. As long as you make sure your X has shape (nb_examples, 1, nb_tokens, wv_sz), you can use something like

model.add(Convolution2D(nb_feature_maps, 1, n_gram, wv_sz))
model.add(MaxPooling2D(poolsize=(nb_tokens - n_gram + 1, 1)))
model.add(Flatten())
model.add(WhateverLayer(nb_feature_maps, nb_outputs))
model.add(...)

If you want to add diversity, you can also do something like a Merge on a list conv-pool sub-models. Here is an example of something I've successfully used on a sentiment classifier:

ngram_filters = [3, 4, 5, 6, 7, 8]
conv_filters = []

for n_gram in ngram_filters:
    conv_filters.append(Sequential())
    conv_filters[-1].add(Convolution2D(nb_feature_maps, 1, n_gram, wv_sz))
    conv_filters[-1].add(MaxPooling2D(poolsize=(nb_tokens - n_gram + 1, 1)))
    conv_filters[-1].add(Flatten())

model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(ngram_filters), 1))
model.add(Activation('sigmoid'))
fchollet commented 9 years ago

2D convolution only really makes sense under the assumption that the input is spatially continuous over both dimensions (like pictures). A sentence is continuous over time, but not over the wv_sz dimension (unless you are using a kind of word/character embedding that is dense and continuous).

Thinking about it, 2Dconv while using a kind of dense and continuous character embedding sounds like the best way to process text. But anyway, if your character embedding is sparse and arbitrary, 1Dconv makes sense.

simonhughes22 commented 9 years ago

Thanks. @fchollet could you provide a quick example of how to do the 1D convolution on some of your textual data?

fchollet commented 9 years ago

@simonhughes22 It's not something I've done before, but I can look into it (I'm a bit busy, so no promises). Is there a paper in particular that you are trying to reproduce?

simonhughes22 commented 9 years ago

Sure that would be awesome. Reproducing this excellent Zhang and LeCun paper would be great: http://arxiv.org/abs/1502.01710

simonhughes22 commented 9 years ago

or doing the same with words if characters are too slow. My dataset is not large, but an LSTM vastly out-performed more vanilla classification methods, and I am hoping using a convolutional network on the same task would be an interesting comparison, and may work well.

fchollet commented 9 years ago

I believe they are using 2D convolutions, and interestingly they are doing it over a sparse discontinuous character embedding. You can do better.

Here is my suggestion: 2D convolutions over a continuous dense character embedding space learned jointly with the main task. Makes much more sense than the braille-like input Zhang and LeCun are using.

# input: 2D tensor of integer indices of characters (eg. 1-57). 
# input tensor has shape (samples, maxlen)
model = Sequential()
model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
model.add(Reshape(1, maxlen, 256)) # reshape into 4D tensor (samples, 1, maxlen, 256)
# VGG-like convolution stack
model.add(Convolution2D(32, 3, 3, 3, border_mode='full')) 
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
# then finish up with Dense layers

Warning: untested, etc.

simonhughes22 commented 9 years ago

Awesome thanks I will try it out

lukedeo commented 9 years ago

My 2D example with word/character vectors was geared towards the continuous embeddings -- in the style of this paper if anyone's interested.

simonhughes22 commented 9 years ago

Cool thanks @lukedeo

fchollet commented 9 years ago

@lukedeo cool, makes sense!

simonhughes22 commented 9 years ago

@lukedeo I'd like to try your example of different sized filters but can't figure out the correct output dimension size, you example above doesn't seem to work. My input dimensions are

(number of sequences, 1, max number of tokens (padded to max len), word vector size). I am using a one hot encoded vector to make it simpler, but that shouldn't matter. I get the following error, using the model described by you above:

results = model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=epochs, validation_split=0.0, show_accuracy=True, verbose=1) File "build/bdist.macosx-10.6-x86_64/egg/keras/models.py", line 204, in fit File "/Users/simon.hughes/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/compile/function_module.py", line 513, in call allow_downcast=s.allow_downcast) File "/Users/simon.hughes/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/tensor/type.py", line 169, in filter data.shape)) TypeError: ('Bad input argument to theano function with name "build/bdist.macosx-10.6-x86_64/egg/keras/models.py:104" at index 1(0-based)', 'Wrong number of dimensions: expected 4, got 2 with shape (16, 1).')

Process finished with exit code 1

Did you do anything else to transform the model or input? 16 is my mini-batch size I believe.

ameasure commented 9 years ago

@simonhughes22 That error suggests one of the layers is getting a 2 dimensional input where it was expecting 4-D. Convolution layers expect 4-D input so you may need to reshape something somewhere.

An example of using a CNN for text classification is below. Note that I have already concatenated the embeddings of the words as a preprocessing step. In particular:

# Convolution layers expect a 4-D input so we reshape our 2-D input
nb_samples = X_train.shape[0]
nb_features = X_train.shape[1]
newshape = (nb_samples, 1, nb_features, 1)
X_train = np.reshape(X_train, newshape).astype(theano.config.floatX)

# We set some hyperparameters
BATCH_SIZE = 16
FIELD_SIZE = 5 * 300
STRIDE = 300
N_FILTERS = 200

# We fit the model
model = Sequential()
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE, 
                        nb_col=1, subsample=(STRIDE, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(N_FILTERS, nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adadelta')
print 'fitting model'
model.fit(X_train, Y_train, nb_epoch=10, batch_size=BATCH_SIZE, verbose=1, 
          show_accuracy=True, validation_split=0.1)
simonhughes22 commented 9 years ago

@ameasure it was actually a theano bug with the concatenation feature as @fchollet kindly pointed out under a different issue. I got bleeding edge theano and it solved that issue. Thank you for your example. I got @lukedeo's example to work with the newer theano, that's actually very powerful. @ameasure wouldn't you be better seeding the embedding layer with your pre-trained vectors and allowing it to fine tune them to your task? This is how I ended up constructing my model:

nb_feature_maps = 32
embedding_size = 64

#ngram_filters = [3, 5, 7, 9]
ngram_filters = [2, 4, 6, 8]
conv_filters = []

for n_gram in ngram_filters:
    sequential = Sequential()
    conv_filters.append(sequential)

    sequential.add(Embedding(max_features, embedding_size))
    sequential.add(Reshape(1, maxlen, embedding_size))
    sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
    sequential.add(Activation("relu"))
    sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
    sequential.add(Flatten())

model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(conv_filters), 1))
model.add(Activation("sigmoid"))

I also have a model that uses a GRU and JZS1 (seems to work better) and an embedding layer, that gives comparable performance. I did try merging a recurrent network and a CNN like above but I got an error so it doesn't seem to like that. I'd be interested if anyone manages to figure out how to do that.

ameasure commented 9 years ago

@simonhughes22 glad you got it working and thanks for sharing your code. I hope to try out @lukedeo''s approach soon but I want to figure out how to modify it to fine tune pre-trained word vectors first. Kim's work (http://arxiv.org/pdf/1408.5882v2.pdf) suggests fine tuning an existing embedding is better than starting from scratch.

One thing that doesn't make sense to me however is convolving along the embedding dimension. That's not what other researchers have done and I can't think of any reason why the embedding vectors would be spatially related. Have you compared their approach to a purely 1 dimensional convolution along only the n_grams?

simonhughes22 commented 9 years ago

@ameasure it's only a 2D convolution as the vectors as stacked vertically and you're convolving across the full depth, so it's just doing a convolution over multiple entire word vectors. Convolving parts of vectors wouldn't make sense, as the ordering of the vector elements is random. @lukedeo references a paper above if you want to know more about it. So it's just taking convolutions of different sizes for different ngrams and concatenating them together into one beastie of a model. For my data a single convolution using embeddings (which are essentially an additional convolution over words) works as well so far, as does a GRU for the most part, although its performance is less predictable.

At some point i'll try using the glove or word2vec vectors as seeds for the embedding layer. What might be a good idea is to take pre-trained vectors as inputs and merge that with randomly seeded vectors, and prevent the pre-trained vectors from being updated by feeding them in directly as inputs. I think that's doable with the merge layer, as you essentially feed it two datasets, one could be vectors the other could be ids as input to an embedding layer. They do that in the some of the stanford papers building off the Glove work where they have pretrained vectors with an extra part of the vector that is updateable and randomly seeded, and supposedly that worked better.

ameasure commented 9 years ago

@simonhughes22 @fchollet @lukedeo I'm trying to implement the CNN text classifiers with embeddings and multiple convolution sizes suggested in your posts above but I keep getting the following error:

Loading data...
8982 train sequences
2246 test sequences
46 classes
X_train shape: (8982L, 50L)
X_test shape: (2246L, 50L)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
Y_train shape: (8982L, 46L)
Y_test shape: (2246L, 46L)
Train on 8083 samples, validate on 899 samples
Epoch 0
Traceback (most recent call last):

  File "<ipython-input-11-60c586a4ee9a>", line 1, in <module>
    runfile('C:/Users/ameasure/Desktop/Programming Projects/cnn/reuters_multi_cnn.py', wdir='C:/Users/ameasure/Desktop/Programming Projects/cnn')

  File "C:\Users\ameasure\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
    execfile(filename, namespace)

  File "C:\Users\ameasure\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/ameasure/Desktop/Programming Projects/cnn/reuters_multi_cnn.py", line 67, in <module>
    model.fit(X=X_train, y=Y_train, batch_size=batch_size, nb_epoch=200, verbose=1, show_accuracy=True, validation_split=0.1)

  File "build\bdist.win-amd64\egg\keras\models.py", line 371, in fit
    validation_split=validation_split, val_f=val_f, val_ins=val_ins, shuffle=shuffle, metrics=metrics)

  File "build\bdist.win-amd64\egg\keras\models.py", line 135, in _fit
    outs = f(*ins_batch)

  File "C:\Users\ameasure\Anaconda\lib\site-packages\theano-0.7.0-py2.7.egg\theano\compile\function_module.py", line 593, in __call__
    self.inv_finder[c]))

TypeError: Missing required input: y

My code is here and I'm using the bleeding edge versions of Theano and Keras. Any idea what's causing this strange error?

simonhughes22 commented 9 years ago

See my notes on your Gist. The input format for the xs is incorrect. It expects a separate 2D array for each Merged sequence model, concatenated into a list.

lukedeo commented 9 years ago

Also, think about how your embeddings are working @ameasure. Right now, you have an embedding for every n-gram filter size, meaning no shared word vectors. I'd recommend using the Graph class and something like this. NOTE this is untested and uncompiled.

ngram_filters = [2, 3, 4]

graph = Graph()

graph.add_input(name='data')

graph.add_node(Embedding(vocab_size + 1, embedding_size), 
               name='embedding', input='data')

for n_gram in ngram_filters:
    sequential = containers.Sequential()
    sequential.add(Reshape(1, maxlen, embedding_size))
    sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
    sequential.add(Activation('relu'))
    sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
    sequential.add(Flatten())

    graph.add_node(sequential, name = 'unit_' + n_gram)

graph.add_node(Dropout(0.5), name='dropout', inputs=['unit_' + n for n in ngram_filters]) 

fc = containers.Sequential()
fc.add(Dense(nb_feature_maps * len(ngram_filters), 15))
fc.add(Activation('sigmoid'))
fc.add(Dense(15, nb_classes))
fc.add(Activation('softmax'))

graph.add_node(fc, name='fully_connected', input='dropout')
graph.add_output(name='output', input='fully_connected')
ameasure commented 9 years ago

@lukedeo @simonhughes22 @fchollet thank you! That fixed the issue and corrected my understanding of what's going on. I am now getting absolutely fantastic results on my dataset by the way, thank you for the help and the wonderful library!

lukedeo commented 9 years ago

@ameasure glad to hear it!

manwinder123 commented 9 years ago

hey guys, great library @fchollet

So i have created a Feed forward neural network using Keras, think i can call it a deep nn, i have 3 hidden layers. sorry if my lingo is off. I am not hitting the accuracy i want on my test set so i wanna see if CNN can help. I have text that i send my nn, i send it a 100 characters (i normalize the characters by dividing their ascii integer value by 255), it does its work and comes out with an output of 20. I have a classification problem.

So @ameasure i am looking at your code and i am little confused. what would field size be in my case, or what is it in your case and how did you get the numbers. What about stride?

@fchollet I would think that using a 1d CNN makes sense for text (since in my case it would be 100x1 sized array), you guys talk about 2d being optimal, why is that?

looking at the api for 1d: Convolution1D(input_dim, nb_filter, filter_length, ...) I think my settings would be: input_dim = 100 (guessing that this is the input size but you say that this is the # of channels, in pictures there are 3, rgb, so would i set this to 1) nb_filter = 25 (i'm not really sure about this one, on the Keras website you write "(dimensionality of the output)", i guess this is sorta like how many outputs the cnn layer will have *filter_length = completely lost on this one

looking at the api for 2d: Convolution2D(nb_filter, stack_size, nb_row, nb_col, ...) I think my settings would be: nb_filter = 25 (see 1d above) stack_size = 1 (i've seen 3 in some of your examples and i assume that its 3 because the pictures have a red, green and blue color channel, my input is text so i think it has only 1 channel??) nb_row = 1 (my data doesn't have multiple rows, its just one row with 100 columns) nb_col = 100 (i have a 100 characters, so i guess thats 100 rows)

I have 50 samples to train on and 20 to test on

Also there is a subsample_length (1d) and subsample (2d) in the cnn layers, i have read that subsampling is similar to pooling. If i added the subsample option to my cnn layer would i skip pooling?

Sorry about all the questions, i've spent hours looking for examples and trying to understand CNN's. I have a good concept of what they are, whats confusing me is the parameters and what i need to set them at. There is also the 1d vs 2d.

# Convolution layers expect a 4-D input so we reshape our 2-D input
nb_samples = X_train.shape[0]
nb_features = X_train.shape[1]
newshape = (nb_samples, 1, nb_features, 1)
X_train = np.reshape(X_train, newshape).astype(theano.config.floatX)

# We set some hyperparameters
BATCH_SIZE = 16
FIELD_SIZE = 5 * 300
STRIDE = 300
N_FILTERS = 200

# We fit the model
model = Sequential()
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE, 
                        nb_col=1, subsample=(STRIDE, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(N_FILTERS, nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adadelta')
print 'fitting model'
model.fit(X_train, Y_train, nb_epoch=10, batch_size=BATCH_SIZE, verbose=1, 
          show_accuracy=True, validation_split=0.1)

thanks for any help :)

ameasure commented 9 years ago

@manwinder123 Take a look at the notes from the Stanford CNN class here: http://cs231n.github.io/convolutional-networks/, it will introduce you to the lingo. The receptive field is the size of the input sequences we're going to feed through our filters in the convolutional layer. In my case it's 5 * 300 because each of my words has been replaced with a 300 dimension vector, and I want the filters to be applied to every contiguous 5 word sequence in my input. Presumably these filters learn to identify 5 word sequences that are useful for my classification task.

Stride is how far we shift the filters after each application to the input. A stride of 300 means we shift the filter over by one full word vector before applying it to the input again.

Regarding the 1d vs. 2d convolutions, it turns out it's all the same. The important thing is that you're shifting your filters across your input in a reasonable manner. Performance is not the same however. When I adopted the approach used by @fchollet and @simonhughes22 and @lukedeo which basically converts a 1d convolution into a 2d convolution I got huge performance improvements. Presumably the underlying implementation is optimized for 2 dimensional convolutions.

simonhughes22 commented 9 years ago

@manwinder123 I think you're making a mistake taking the ascii values as the inputs. What that is doing is taking things that are discrete, characters, and converting it to a continous quantity. This is implying that adjacent letters in the alphabet have very similar meanings, and letters further apart do not. Instead, what you want to do is either use an embedding layer (pass it a list of id's, one per character, counting from 1 upwards (if 0 padded, else start at 0), with no gaps in the ids), or a one-hot encoding (a 255 element vector of zeros, with a one in the index of the ascii value). Secondly, I would stick with a stride of 1,1, and do a convolution over the characters. In fact I'd go one further and recommend you use words and not characters with the encoding method as described. Assign a unique id to each word, replace the words with ids, do some zero padding and pass to an embedding layer and then a convolutional layer as in the code examples above. Once that's working, you can experiment with merging convolutions of different sizes. The idea then is to having 2D convolutions over the embeddings, where the word embeddings are stacked vertically, and so you are setting nb_rows = the emdedding size, i.e. the matrix height, and the nb_cols to the number of words you want to convolve over.

Hope that makes sense. I guarantee that will work much better. I've also had as good results using the GRU and LSTM layers combined with embedding layers, although those run much slower as theano's scan function (which they rely on) is quite slow.

manwinder123 commented 9 years ago

@ameasure Thanks for the link, it clear up a lot of the questions i had

@simonhughes22 thanks for the tip, i'll try it out. hopefully everything goes well :)

appreciate the support :dancer:

xpact commented 9 years ago

See great example of Convolutional1D applied to text classification from fchollet now in keras: https://github.com/fchollet/keras/blob/master/examples/imdb_cnn.py

manwinder123 commented 9 years ago

Thanks for the link.

So I tried using 1d but it was extremely slow. My input dim size was the size of my input. So if my file had 5000 characters, I set input dim to 5000. I had set the nb of filters to 1 (I thought it would help speed it up). But it was way too slow, this was a few weeks ago. I haven't tried recently though

simonhughes22 commented 9 years ago

5000 characters is way too much to train an LSTM or some form of recurrent model. They can learn long distance relationship, but not that long. I'd either switch to a word model (although that's still likely too large), or use a sliding window approach of some sort.

zachmayer commented 8 years ago

I know I'm digging up old code here, but I'm really inrigued by this approach. However, I get errors when trying to use the structure suggested by @fchollet:


max_features = 1000
maxlen = 10

model = Sequential()
model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)
model.add(Reshape((1, maxlen, 256))) # reshape into 4D tensor (samples, 1, maxlen, 256)
# VGG-like convolution stack
model.add(Convolution2D(32, 3, 3, 3, border_mode='full')) 
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1))

Returns:

In [164]: model = Sequential()

In [165]: model.add(Embedding(max_features, 256)) # embed into dense 3D float tensor (samples, maxlen, 256)

In [166]: model.add(Reshape((1, maxlen, 256))) # reshape into 4D tensor (samples, 1, maxlen, 256)

In [167]: # VGG-like convolution stack

In [168]: model.add(Convolution2D(32, 3, 3, 3)) 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims)
   2481         try:
-> 2482             prod = a.prod
   2483         except AttributeError:

AttributeError: 'tuple' object has no attribute 'prod'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-168-9b06d1cea8a5> in <module>()
----> 1 model.add(Convolution2D(32, 3, 3, 3))

/usr/local/lib/python3.5/site-packages/keras/layers/containers.py in add(self, layer)
     68         self.layers.append(layer)
     69         if len(self.layers) > 1:
---> 70             self.layers[-1].set_previous(self.layers[-2])
     71             if not hasattr(self.layers[0], 'input'):
     72                 self.set_input()

/usr/local/lib/python3.5/site-packages/keras/layers/core.py in set_previous(self, layer)
     96         assert self.nb_input == layer.nb_output == 1, 'Cannot connect layers: input count and output count should be 1.'
     97         if hasattr(self, 'input_ndim'):
---> 98             assert self.input_ndim == len(layer.output_shape), ('Incompatible shapes: layer expected input with ndim=' +
     99                                                                 str(self.input_ndim) +
    100                                                                 ' but previous layer has output_shape ' +

/usr/local/lib/python3.5/site-packages/keras/layers/core.py in output_shape(self)
    764     @property
    765     def output_shape(self):
--> 766         return (self.input_shape[0],) + self._fix_unknown_dimension(self.input_shape[1:], self.dims)
    767 
    768     def get_output(self, train=False):

/usr/local/lib/python3.5/site-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape)
    752                 known *= dim
    753 
--> 754         original = np.prod(input_shape, dtype=int)
    755         if unknown is not None:
    756             if known == 0 or original % known != 0:

/usr/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims)
   2483         except AttributeError:
   2484             return _methods._prod(a, axis=axis, dtype=dtype,
-> 2485                                   out=out, keepdims=keepdims)
   2486         return prod(axis=axis, dtype=dtype, out=out)
   2487     else:

/usr/local/lib/python3.5/site-packages/numpy/core/_methods.py in _prod(a, axis, dtype, out, keepdims)
     33 
     34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
---> 35     return umr_prod(a, axis, dtype, out, keepdims)
     36 
     37 def _any(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

What am I doing wrong?

(Note that I added another pair of parenthesis to the reshape layer, and took out the border_mode='full' in the first convolution)

wbars commented 8 years ago

Yeah, is there any examples of using CNN with simple 1d inputs? (X_train would be n_samples x n_features matrix i guess) and i want just organize first CNN layer to have n_samples inputs?

ameasure commented 8 years ago

I put together an example using one hot inputs here: https://gist.github.com/ameasure/985c87bb8b34ac30269f

One hot text inputs work surprisingly well, especially for LSTM's.

eshijia commented 8 years ago

@simonhughes22 hi, as the code below:

for n_gram in ngram_filters:
    sequential = Sequential()
    conv_filters.append(sequential)

    sequential.add(Embedding(max_features, embedding_size))
    sequential.add(Reshape(1, maxlen, embedding_size))
    sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
    sequential.add(Activation("relu"))
    sequential.add(MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1)))
    sequential.add(Flatten())

model = Sequential()
model.add(Merge(conv_filters, mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(nb_feature_maps * len(conv_filters), 1))
model.add(Activation("sigmoid"))

Was your input shape (to Embedding layer) still (nb_samples, 1, max_len, vector_size) ?

brianlow commented 8 years ago

@eshijia I think the input shape is (nb_samples, maxlen). Where nb_samples is the number of sentences and maxlen is the number of words per sentence. An example might be:

[ 
  [1, 2, 3],    # Bob eyed Alice
  [3, 4, 1]     # Alice uppercut Bob
]
ddofer commented 8 years ago

Any chance of updating this for current 1D convolutions and API? (I'm running into issues when trying to translate this..)

Thanks!

vinayakumarr commented 8 years ago

I have data set which is of size 393021 rows with 41 features and are classified in to 23 classes. I have used keras imdb_cnn.py example for my data set but i am able to get only 52 percentage accuracy. Could you please tell us regarding how to increase the accuracy for my data set

ddofer commented 8 years ago

I have a similar case - validation accuracy remains at about the baseline distribution ; despite the use of various optimizers, dropout filter sizes and +- pooling.

On Mon, Jul 25, 2016 at 1:55 PM, vinayakumarr notifications@github.com wrote:

I have data set which is of size 393021 rows with 41 features and are classified in to 23 classes. I have used keras imdb_cnn.py example for my data set but i am able to get only 52 percentage accuracy. Could you please tell us regarding how to increase the accuracy for my data set

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/233#issuecomment-234922765, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4hg4gfT3jgiyBbE7525TeHGGDqVfdAks5qZJYigaJpZM4FEh-z .

Dan Ofer - דן עופר Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ

Photography http://picasaweb.google.com/ddofer http://500px.com/DanOfer

ishalyminov commented 8 years ago

Hi all,

I have an issue with basically the same task: minibatches of fixed-length sequences of 1-hot's --> sequences of embeddings --> 1-d convolution (chopping 3-grams) And I'm using Keras Functional API. Here's what I got:

input_layer = Input(name='input', shape=(sequence_length, one_hot_size))
emb = Embedding(one_hot_size, embedding_size, name='embedding')(input_layer)
conv = Convolution1D(64, 3, name='conv')(emb)

and I get this error with the emb layer:

Exception: Input 0 is incompatible with layer conv: expected ndim=3, found ndim=4

Could you please tell me what is the problem and how do I control the shape of each layer in this case?

UPD: turns out, Embedding layer works fine with just one-hot indices, and I passed into it 1-hot vectors which messed up the shapes.

rfalba commented 7 years ago

hi @simonhughes22, could you try to explain how does you model change if one wants to use documents as input, represented as a matrix of sentence matrix, each sentence matrix would be the words vector stacked as you pointed out. One training sample would be in this case a 3d vector.

dupsys commented 7 years ago

hi all, i am doing my best to implement this Soroush paper(https://arxiv.org/pdf/1607.07514.pdf) as embedding and representation. Here is the snippet of my code for vectorization: maxSequenceLength = 1 + max([len(x.split(" ")) for x in captures_text]) inputChars = np.zeros((len(captures_text), maxSequenceLength)) nextChars = np.zeros((len(captures_text), maxSequenceLength)) print('Prepare the dataset for input and output pairs encoding as integers....') for i in range(0, len(captures_text), 3): inputChars[i, 0]= char_2_id['S'] try: nextChars[i, 0]= char_2_id[captures_text[i][0]] except IndexError: pass for j in range(1, maxSequenceLength): if j < len(captures_text[i]) + 1: inputChars[i, j] = char_2_id[captures_text[i][j - 1]] if j < len(captures_text[i]): nextChars[i, j]= char_2_id[captures_text[i][j]] else: nextChars[i, j]= char_2_id['E'] else: inputChars[i, j]= char_2_id['E'] nextChars[i, j] = char_2_id['E'] and i build the model as: inputs = Input(shape=(180,))

inputs_f = Flatten()(inputs)

embedded_layer =Embedding(200, embedding_dim, input_length=180)(inputs) l_conv1 = Convolution1D(nb_filters, filter_length=filter_sizes[0], border_mode='valid', activation='relu')(embedded_layer) l_pool1 = MaxPooling1D(pool_length=pooling_size)(l_conv1) l_conv2 = Convolution1D(nb_filters, filter_length=filter_sizes[0], \ border_mode='valid', activation='relu')(l_pool1) l_pool2 = MaxPooling1D(pool_length=pooling_size)(l_conv2) l_conv3 = Convolution1D(nb_filters, filter_length=filter_sizes[0], \ border_mode='valid', activation='relu')(l_pool2) l_conv4 = Convolution1D(nb_filters, filter_length=filter_sizes[0], \ border_mode='valid', activation='relu')(l_conv3) l_pool4 = MaxPooling1D(pool_length=2)(l_conv4)

lstm_1 = LSTM(256, return_sequences=False)(l_pool4) l_in_rep = RepeatVector(180,)(lstm_1)

output size none, 280, 256

l_decoder_1 = LSTM(256, return_sequences=True)(l_in_rep) l_decoder_2 = LSTM(256, return_sequences=True)(l_decoder_1)

output size none, 280, 256

fc_layer_1 = Dense(68,activation='relu')(l_decoder_2) drop1_out = Dropout(0.5)(fc_layer_1) flat = Flatten()(drop1_out) out = Dense(180,activation='softmax')(flat) model = Model(inputs, out)

Compilation

model.compile(loss='categorical_crossentropy', optimizer = 'adam' , metrics=['accuracy']) model.summary() ############## On training the model, i got a very high loss values with small accuracy. please tell me if i am doing something wrong and any suggest to fix it will be appreciated. For example: i = 0 Epoch 1/5 {'acc': array(0.0, dtype=float32), 'loss': array(320.0159606933594, dtype=float32), 'batch': 0, 'size': 32} 1/20000 [..............................] - ETA: 46478s - loss: 320.0160 - acc: @0.0000e+00

AlexPapas commented 6 years ago

try this: https://github.com/pinkeshbadjatiya/twitter-hatespeech/blob/master/cnn.py