keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.91k stars 19.45k forks source link

Character Level CNN based features concatenation with Word Embeddings #1400

Closed napsternxg closed 8 years ago

napsternxg commented 8 years ago

I am trying to implement the Character CNN + LSTM model presented in this paper (http://arxiv.org/abs/1511.08308) for Named Entity Recognition using Keras. However, I am facing issue with including the Character based CNN features along with the word embedding for the model.

The basic code for my model using Bidirectional LSTM using Keras Graph model is presented here: https://github.com/napsternxg/DeepSequenceClassification

However, for each word I also want to add CNN based features on the character embedding to the model input. I know I can do this by creating a new input node and adding a CNN layer to it and then merging the Convolution + Pooling output with the Embedding features, but I am facing issues with defining the model input. For each word in the sequence I will have a sequence of characters which I can pad to make it say max_word_char_size long, but this will make the input layer consist of 3 dimensions (samples, maxlen, max_word_char_size) then this layer will pass through an character embedding layer which will output a layer of size (samples, maxlen, max_word_char_size, char_embedding_size) and I am not sure how to go about doing this.

I have come up with the following hack using different dim_ordering modes and Reshape layers to ensure I get 1 vector per word in each sequence. These vectors will then be concatenated with the embedding of each word.

import theano, keras
from keras.models import Sequential, Graph
from keras.layers.core import Dense, Dropout, Activation, TimeDistributedDense, Flatten, Merge, Reshape
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM, GRU
from keras.layers.convolutional import Convolution1D, MaxPooling1D, Convolution2D, MaxPooling2D
from keras.preprocessing.sequence import pad_sequences

import numpy as np

max_len, max_char_len = 5, 4
char_vocab = 100
char_embedding_size=50
nb_filters = 10
# Sequence words represented as sequence of sequence of characters
X_c = np.array([[[0,1,2,3], [0,1,2,3],[0,1,2,3],[0,1,2,3], [0,1,2,3]],
               [[0,1,2,3], [0,1,2,3],[0,1,2,3],[0,1,2,3], [0,1,2,3]]])
# corresponding sequence of words
X_w = np.array([[1,1,1,1,1], [1,1,1,1,1]])
print X_c.shape, X_w.shape
# Output: (2, 5, 4) (2, 5)

model = Sequential()
model.add(Embedding(char_vocab, char_embedding_size, input_length=max_len*max_char_len))
model.add(Reshape((max_len, max_char_len, char_embedding_size)))
model.add(Convolution2D(nb_filters, 1, 2, dim_ordering='tf', border_mode='same')) # Hack to ensure I get output per word
model.add(MaxPooling2D((2, 2), dim_ordering='th')) # Hack to get output per word
model.add(Reshape((max_len, 10)))

model.compile(optimizer='adam', loss='mse')

input = X_c.reshape(2,max_len*max_char_len) # Hack to pass all chars to the embedding layer.
output = model.predict(input)
print output.shape
# Outputs: (2, 5, 10)

I look forward to your feedback if this is the correct way to go about it or if there is any other better way to do it.

napsternxg commented 8 years ago

I have implemented a basic version of it at https://github.com/napsternxg/DeepSequenceClassification in the model.py file and am able to run it. I will close this issue as of now and if anyone can suggest a better way then feel free to comment on it.

Sandy4321 commented 8 years ago

super thank a lot

On Tue, Jan 5, 2016 at 4:17 PM, Shubhanshu Mishra notifications@github.com wrote:

I have implemented a basic version of it at https://github.com/napsternxg/DeepSequenceClassification in the model.py file and am able to run it. I will close this issue as of now and if anyone can suggest a better way then feel free to comment on it.

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/1400#issuecomment-169136425.

ylqfp commented 8 years ago

Nice, Thanks a lot!

jayinai commented 8 years ago

@napsternxg Thanks for sharing. For word embedding, are we able to used pre-trained word vectors (from word2vec, e.g.), and how?

dupsys commented 7 years ago

Thank for this wonderful post. Suppose, i have an input string say "cu 2morr, gf" and the expected output is "see you tomorrow, girlfriend". The corresponding sequence of the input is my expected output. I am very confused on how i should validate the model or predict the output. Please help.

Abbey

napsternxg commented 7 years ago

@dupsys your problem is probably an instance of seq2seq prediction where the input sequence is "cu 2morr, gf" and the output sequence is "see you tomorrow, girlfriend". In this case the input and output seq lengths will be different, also you might want to use an attention based model (used mostly in machine translation literature).

@ShuaiW right now my code doesn't support this, but you can edit the embedding layer to take the pre-trained embeddings.

dupsys commented 7 years ago

Thank you napstenxg. Will look at the implementation of attention model

dupsys commented 7 years ago

Hi napsternxg, Please, can you give me a hint with the attached code segment. I am trying to implement Tweet2Vec from the scratch and add attention model on it.

codesegment.txt

I got this error: Error when checking model input: expected input_1 to have 2 dimensions, but got an array with shape (307551, 180, 68).

monod91 commented 7 years ago

Hi @napsternxg this is a great feature, thanks for implementing it! I am trying to understand your code - but I am quite confused about how you cope with padding. You pad/cut you sentences to maxlen_sent AND all words inside of the sentence to maxlen_char, isn't it? But:

So how can you ensure padding not to interfere while backprop/calculating the metrics? Thank you in advance for your answer :)

hardikmeisheri commented 7 years ago

@napsternxg have you passed number of words as number of channels in CNN?

williambenhaim commented 7 years ago

Thank you napstenxg, It's great !! Do you find a way to implement the CRF ?

iuria21 commented 7 years ago

Hi @napsternxg! Thanks for your great work! There is a thing i don't understand very well: for the input to the character embedding layer, do you have to padd any word with a fixed length? Thanks

kamalkraj commented 6 years ago

Hi @napsternxg implementation of the Character CNN + LSTM model presented in this paper (http://arxiv.org/abs/1511.08308) for Named Entity Recognition using Keras.

https://github.com/kamalkraj/Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs