Closed phipleg closed 4 years ago
I would find this useful.
Ok, great! Before I make the pull request I try to figure out the implementation in tensorflow in order to get the ChainCRF going with both backends.
I'm interesting,I would find this very useful.
Following as well. I was about to embark on an implementation myself. (May still just for the exercise, but if you have something I would be interested.)
Interested as well! ;)
This is awesome!
This will be great!
Thanks for your support! I am almost done, except for a bug in my tensorflow implementation. I hope to resolve this issue in a few days.
It will be an awesome function!
Great, waiting for update!
Sorry, for the delay. I am still working on it, but in my spare time. The layer is complete, but the example is not finished yet.
I'm interested as well. I have the problem that RNNs are not able to capture e.g. BIO-encoding correctly and produce ill formated BIO-tags (e.g. starting an I-tag without a previous B-tag).
Thanks for contributing and looking forward to your implementation.
python3 conll2000_bi_lstm_crf.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "conll2000_bi_lstm_crf.py", line 16, in
I have run the setup.py in https://github.com/fchollet/keras/tree/bba6b521abc462261dd65883be59c94e1467b7cf And I can see the 'crf,py' in /keras/layers/, but when I ran the '/examples/conll2000_bi_lstm_crf.py', I got the ImportError.
What is the right way to run this file?
This should work if the library is properly installed. I guess you had a previous keras version in your conda environment. Then your install didn't update existing files but just added the new ones. For example, keras/layers/__init__.py
which is likely to be the source of the error.
Try again:
python setup.py install --force
You can check the installation by running
python3 -c "from keras.layers import ChainCRF"
If this doesn' throw an ImportError
, the example should work.
Thanks. After uninstalled my previous keras version, I successfully imported the package.
hi I'm running a blstm-crf model but before the training begins I meet the following error:
Train on 1860255 samples, validate on 206696 samples
Epoch 1/5
Traceback (most recent call last):
File "train_keras_model.py", line 125, in
and my model is as follows: model = Sequential() model.add(Embedding(output_dim = embeddingDim, input_dim = vocabSize + 1, input_length = maxlen, mask_zero = False, weights = [embeddingWeights])) model.add(Bidirectional(LSTM(output_dim = hiddenDims, return_sequences = True), merge_mode = 'concat')) model.add(Dropout(dropout)) model.add(TimeDistributed(Dense(outputDims))) crf = ChainCRF() model.add(crf) model.compile(loss = crf.loss, optimizer = 'adam', metrics = ["accuracy"])
before that I add a TimeDistributed wrapper to make the input dim of CRF be correct.But I don't know what this error means.Could somebody help me?
In your setting, the targets must be one-hot encoded and hence of dimension 3 (and not 2), i.e:
Y_test.shape = (nb_samples, timesteps, nb_classes)
Thanks for the reply but I'm not very clear about the shape.
After my preprocess I use
Y_test = np_utils.to_categorical(test_y, LabelDims)
So one batch of Y has the size (batchsize, LabelDims)
,i.e. (nb_samples, nb_classes)
So how can I take the dimension transition from (nb_samples, nb_classes)
to
(nb_samples, timesteps, nb_classes)``? Is there any functions or do I need to have a change in the preprocess step?
Thanks a lot.
You cannot make the desired dimension transition. The model works only for temporal data, but your preprocessing shows that this is not true in your case. Why are you trying to use a ChainCRF?
I use lstm and crf to Chinese sequence segmentation. In my preprocess I use a window sliding the sentence so the training data has X.size = (total_count, window_size)
and Y.size = (total_count, )
.
It seems that I should set a sequence length as timesteps to form the data shape to be X.size = (batch_size, seq_len, window_size)
and Y.size = (batch_size, seq_len)
But I have used an embedding layer which make the output dim to the LSTM layer become (batch_size, window_size, embedding_dim)
.And the network works well and has a good result because it takes window_size as time steps, with a set of return_sequences = False
in the last LSTM layer, which make the dimension correct between network's output and Y.
And I notice that the ChainCRF layer doesn't support mask_zero argument yet. So does it means I should discard the embedding layer and retrim the data dimension to be X.size = (batch_size, seq_len, window_size)
and Y.size = (batch_size, seq_len, n_classes)
(after a categorical)?
Thanks for the advice and I have already fixed the problem by discrad embedding layer and retrim data dimension.
Can I use this Chain CRF to implement BiLSTM with CRF for NER tagging as shown in the code here https://github.com/glample/tagger
got this error any help ?
Loading data... Unique words: 17260 Unique pos_tags: 45 Unique chunk tags: 24 X_words_train shape: (8936, 80) X_words_test shape: (2012, 80) y_train shape: (8936, 80, 1) y_test shape: (2012, 80, 1) Build model... File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 569, in call self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 632, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 164, in create_node output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0])) File "/usr/local/lib/python2.7/dist-packages/keras/layers/crf.py", line 122, in call y_pred = K.crf_inference(x, self.U, self.b) AttributeError: 'module' object has no attribute 'crf_inference'
Hi @SamihYounes,
Please update to the latest version of the pull request #4621.
Hi @LopezGG,
yes this will be possible. You can start with the chunking example. When the CRF layer is finally merged, I will provide an example.
I have a question about the example code
n_words = 10000
maxlen = 32
(X_train, y_train), (X_test, y_test) = load_treebank(nb_words=n_words, maxlen=maxlen)
n_samples, n_steps, n_classes = y_train.shape
model = Sequential()
model.add(Embedding(n_words, 128, input_length=maxlen, dropout=0.2))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes)))
model.add(Dropout(0.2))
crf = ChainCRF()
model.add(crf)
model.compile(loss=crf.loss, optimizer='rmsprop', metrics=['accuracy'])
After LSTM layer, Dense layer convert each timestep of the LSTM output into n_classes dimension. For instance, the output of LSTM is (batch_size, timestep, lstm_output), which is (64, 3, 100). Why this output should be converted to (64, 3, nb_class) using Dense layer. Couldn't data of shape (64, 3, 100) be the direct input to CRF layer and then make the output of CRF layer be (64, 3, nb_class)?
CRF could take features of size 100 for each timestep and then output tags of size 8 right?
Dear @JacobIsrael123,
Of course, we could integrate a dense layer for input dimension conversion but this is not always necessary (for example if the preceding layer is recurrent layer with the output dimension nb_classes). While designing the ChainCRF layer, I decided to keep it simple as possible.
where does load_treebank come from? what is the shape of the data? @JacobIsrael123
Hello, @phipleg:
I want to try @JacobIsrael123 's idea : let the output of every timestep be the direct input to CRF layer.
Can I add CRF layer like this:
model.add(TimeDistributed(ChainCRF()))
and in this case, what is the shape of CRF layer‘s output at every timestep? 4 dimensional vector(if my data has 4 labels,like 'B I E O')
My opinion is that, the CRF layer take place of the function of a dense layer which like this: model.add(TimeDistributed(Dense(Y_train.shape[2], activation='softmax'))) (use softmax activation to deal with some simple Sequencial-labeling tasks) Am I right?
Hi @Ethan1214,
The ChainCRF is not an activation function that you can apply manually on each timestep. The time dimension must be part of the input. In fact, the input and output shape of the ChainCRF are identical and equal to (batch_size, maxlen, n_classes)
.
For example:
vocab_size = 20
n_classes = 11
model = Sequential()
model.add(Embedding(vocab_size, n_classes))
layer = crf.ChainCRF()
model.add(layer)
model.compile(loss=layer.sparse_loss, optimizer='sgd')
batch_size, maxlen = 2, 5
x = np.random.randint(1, vocab_size, size=(batch_size, maxlen))
y = np.random.randint(n_classes, size=(batch_size, maxlen))
y = np.expand_dims(y, 2)
model.train_on_batch(x, y)
Hi @phipleg , I find your implementation of ChainCRF great! I am having troubles getting my tagging model work. Forgive me if my question is trivial - I am a newbie in keras :) So basically I would like to compare a very simple model for tagging (with layers Embedding - LSTM - TimeDistributed Dense) with and without a CRF layer on the top. Something like
sentences = Input((maxlen,))
word_embeddings = Embedding(input_dim = (vocab_size + 1), output_dim = embedding_size, input_length = max_length)(sentences)
lstm_output = LSTM(output_dim = hidden_size, return_sequences = True)(word_embeddings)
crf_scores = TimeDistributed(Dense(n_classes))(lstm_output)
crf = ChainCRF()
probabilities = crf(crf_scores)
loss = crf.loss
model = Model(input = [sentences], output = probabilities)
model.compile(metrics = [categorical_accuracy, sample_weight_mode = "temporal", loss = loss, optimizer = "sgd")
My model works fine when I do not use any CRF layer on the top (like a TimeDistributed
Dense
layer with categorical crossentropy
as loss).
However, when trying to add the CRF layer I get the following error:
ValueError: Input dimension mis-match. (input[0].shape[1] = BATCH_SIZE, input[1].shape[1] = MAX_LEN)
Did you also have the same problem in the past?
Hi @monod91,
unfortunately, sample_weight_mode = "temporal"
is not supported for the ChainCRF. This might cause your problem.
Hi @phipleg , thank you for your answer.
At the beginning I thought it was a problem of importing - I could not import keras with the new CRF layer with pip, so I manually copied the scripts crf.py and init.py to my keras foder. I thought it could have caused a dependency problem - can you confirm that I do not have to copy other scripts to the keras folder?
So you are not using weigths? I was using them for kinda 'masking' padding (I did't want to use mask_zeros=True
in the Embedding Layer because I wanted to be able to eventually use also convolutional layers, which do not support masking). May I ask you how you solved the problem of masking, then?
I moved my work on the CRF to the crf
branch in my fork (https://github.com/phipleg/keras/tree/crf), but it is not fully Keras 2 compliant yet (persistence is not supported and mini batches of size 1 are problematic)
Correct, I either use a mask or ignore the variable length issue. If I am not mistaken, a loss function in Keras returns a tensor of shape (batch_size, timesteps)
, i.e. a loss value per sample and time step. This is mapped to a tensor of shape (batch_size, )
by multiplying a the sample_weight
array followed by taking average along the time axis. In contrast, the loss functions of a ChainCRF works differently and return tensor of shape (batch_size, )
.
Hi, I was using the previous CRF layer implementation and it was working fine. The I switched to the keras 2.0.0 and new CRF implementation in https://github.com/phipleg/keras/tree/crf. The new implementation produced unacceptable results. Then I ran the conll2000_bi_lstm_crf.py example and I have the same observation there. Seems there is a issue in loss functions. It produces 'loss: nan' for each epoch. e.g., 8936/8936 [==============================] - 157s - loss: nan - sparse_categorical_accuracy: 0.7038
Any ideas? Am I doing something wrong?
@phipleg Can you share the document/tutorial you used for implementing ChainCRF module? Thanks.
The above issue with loss value is only observed with Theano backend. Tensorflow works fine.
new to git hub, how can I run setup in https://github.com/fchollet/keras/tree/bba6b521abc462261dd65883be59c94e1467b7cf?
if I use cntk, can I still use the crf layer?
@liyzhang
python setup.py install --force
I am working on a sequence labeling task based on a bi-directional LSTM architecture with variable sequence length (I'm not padding sentences). Thus, during training, I have a lot of mini-batches, including those with size 1. @phipleg said in a previous post that "mini batches of size 1 are problematic". Does this mean that this implementation won't work in such situation?
@dfalci This was fixed in a later commit, now the CRF implementation works fine for mini-batches of size 1.
A hint on speeding up your idea: What I do is to group sentences by sentence length and then create mini-batches of sentences with the same length. If your train data is large enough and the sentence are approx. of the same length, you will only have few mini-batches with a single sentence.
@phipleg I'm interested in using your implementation, but am wondering if you could elaborate on what this means:
The layer is the identity function during training and applies the forward-backward algorithm during inference. For that it holds a set of trainable parameters which is accessed by a specific loss function.
What I think this means is that the second-from-last layer, the one just before the CRF, is actually trying to predict the target directly, and learns to do so based on a loss function that is distinct from the CRF (e.g. cross-entropy). Meanwhile, the CRF learns a set of transition probabilities, based on its own loss function --- the log likelihood calculated from the forward-backward algorithm.
So, training of the CRF could be decoupled from training from the rest of the network, since the CRFs parameters do not affect the loss function as seen by the rest of the network ("The layer is the identity function during training").
Have I understood correctly?
While this seems reasonable, my reading of Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al 2015) and Neural Architectures for Named Entity Recognition (Lample 2016) is that their Bi-LSTM-CRF implementations are trained by back-propagating the CRF's log-likelihood loss function through the entire network. (I could be mistaken of course.)
@enewe101 I can maybe answer that.
The CRF-Layer is updated using back-propagation during the training to learn transition probabilities. However, for training, we already know the correct labels. Hence, training and inference for a CRF or Hidden-Markov-Model in a Neural Network is distinct from those layers at inference. But the error function of the CRF layer is used for training and the transistions are updated with each epoch.
You could of course also decouple the training of the network from the training of the CRF / HMM. But this is seldom done as it introduces further complexity.
The paper of Collobert et al. 'NLP almost from scratch' explains the process well how to add a HMM to a network and how the training and inference must be modified.
In my implementation (https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf) I achieve on par results with Huang et al., Lample et al., and Ma & Hovy for various tasks using this CRF implementation. So it appears that this CRF implementation works well
Hi @enewe101,
as @nreimers pointed out the CRF layer only applies the costly inference at prediction, and not at training because the target labels are already known. During training it acts as the identiy but at the same time holds the parameters for the CRF loss. You need to use this loss in your model (and not some cross-entropy like you said). Otherwise, by taking gradients, the CRF parameters won't get any updates.
thanks @nreimers and @phipleg !
@phipleg I see that typically the CRF layer loss is applied to a single output layer. Is it possible to have two outputs, as in the functional API demo, while using the CRF loss form two different CRFs?
Hi @chaxor,
have you tried already something like this?
input_for_crf1 = ...
input_for_crf2 = ...
crf1 = ChainCRF(params_for_crf1)(input_for_crf1)
crf2 = ChainCRF(params_for_crf1)(input_for_crf2)
model = Model(inp, [out1,out2])
model.compile(optimizer = ...., loss = [crf1.loss, crf2.loss])
@phipleg Well, that was a simple fix. Thank you so much for your help!
hey, i just use the newest ChainCRF layer but the result is strange.The acc of train set first increase and then decrease,and the acc of val increase continuously, i am not clear about that, can you explain it ?@phipleg here is the train process.
Dear @KARABAER,
it is hard to say without knowing your complete model, the data and training code. Please give more details.
I implemented a Linear Chain CRF layer for sequence tagging tasks inspired by the paper:
Lample et al. Neural Architectures for Named Entity Recognition (Neural Architectures for Named Entity Recognition)
The layer is the identity function during training and applies the forward-backward algorithm during inference. For that it holds a set of trainable parameters which is accessed by a specific loss function.
You can see the API in the short gist for pos tagging on the penn treebank (provided by NLTK): https://gist.github.com/phipleg/adfccb0ad96b777eecc9bb0f16ab54fc
Currently, it is only implemented in Theano and supports fixed length sequences (no masking).
Is anybody interested in seeing the layer in Keras? The need was raised in issue 824 but the issue is closed.
I could refactor my code and make a pull request in a few days. For that I would need to add a few functions to the Theano backend because I make use of Theano's scan function. I could also provide an example.